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Preface 


This book is based on a course in real analysis offered to advanced undergraduates and 
first-year graduate students at Bowling Green State University. In many respects it is 
a perfectly ordinary first course in analysis, but there are some important differences. 
For one, the typical audience for the class includes many nonspecialists, students of 
statistics, economics, and education, as well as students of pure and applied mathematics 
at the undergraduate and graduate levels. What’s more, the students come from a wide 
variety of backgrounds. This makes the course something of a challenge to teach. The 
material must be presented efficiently, but without sacrificing the less well-prepared 
student. The course must be essentially self-contained, but not so pedestrian that the 
more experienced student is bored. And the course should offer something of value 
to both the specialist and the nonspecialist. The following pages contain my personal 
answer to this challenge. 

To begin, I make a few compromises: Extra details are given on metric and normed 
linear spaces in place of general topology, and a thorough attack on Riemann-Stieltjes 
and Lebesgue integration on the line in place of abstract measure and integration. On 
the other hand, I avoid euphemisms and specialized notation and, instead, attempt to 
remain faithful to the terminology and notation used in more advanced settings. Next, 
to make the course more meaningful to the nonspecialist (and more fun for me), I toss 
in a few historical tidbits along the way. 

By way of prerequisites, I assume that the reader has had at least one semester of 
advanced calculus or real analysis at the undergraduate level. For example, I assume 
that the reader has been exposed to (and is moderately comfortable with) an “e-<$” 
presentation of convergence, completeness, and continuity on the real line; a few “name” 
theorems (Bolzano-Weierstrass, for one); and a rigorous definition of the Riemann 
integral, but I do not presuppose any real depth or breadth of understanding of these 
topics beyond their basics. 

The writing style throughout is deliberately conversational. While I have tried to 
be as precise as possible, the odd detail here and there is sometimes left to the reader, 
which is reflected by the use of a parenthetical (Why?) or (How?). The decision to 
omit these few details is motivated by the hope that the student who can successfully 
navigate through this “guided tour” of analysis, who is willing to get involved with the 
mathematics at hand, will come away with something valuable in the process. 

You will notice, too, that I don’t try to keep secrets. Important ideas are often 
broached long before they are needed in the formal presentation. A particular theme 
may be repeated in several different forms before it is made flesh. This repetition is 
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necessary if new definitions and new ideas are to seem natural and appropriate. Once 
such an idea is finally made formal, there is usually a real savings in the “definition- 
theorem-proof” cycle. The student who has held on to the thread can usually see the 
connections without difficulty or fanfare. 

The book is divided, rather naturally, into three parts. The first part concerns gen- 
eral metric and normed spaces. This serves as a beginner’s guide to general topology. 
The second part serves as a transition from the discussion of abstract spaces to con- 
crete spaces of functions. The emphasis here is on the space of continuous real-valued 
functions and a few of its relatives. A discussion of Riemann-Stieltjes integration is 
included to set the stage for the later transition to Lebesgue measure and integration in 
the third and last part. A more detailed description of the contents is given below. 

Where to start is always problematic; a certain amount of review is arguably neces- 
sary. Chapters One, Two, and Ten, along with their references, provide a source for such 
review (albeit incomplete at times). These chapters serve as a rather long introduction 
to Parts One and Two, primarily spelling out notation and recalling facts from advanced 
calculus, but also making the course somewhat self-contained. 

The “real” course begins in Chapter Three, with metric and normed spaces, with 
frequent emphasis on normed spaces. From there we collect “C” words: convergence, 
continuity, connectedness, completeness, compactness, and category. 

Part Two concerns spaces of functions. The reader will find a particularly heavy 
emphasis on the interplay between algebra, topology, and analysis here, which serves as 
a transition from the “sterile” abstraction of metric spaces to the “practical” abstraction 
of such results as the Weierstrass theorem and the Riesz representation theorem. 

Part Three concerns Lebesgue measure and integration on the real line, culminating 
in Lebesgue’s differentiation theorem. While I have opted for a “hands-on” approach to 
Lebesgue measure on the line, I have not been shy about using the machinery developed 
in the first two parts of the book. In other words, rather than presenting measure theory 
from an abstract point of view, with Lebesgue measure as a special case, I have chosen 
to concentrate solely on Lebesgue measure on the line, but from as lofty a viewpoint as 
I can muster. This approach is intended to keep the discussion down to earth while still 
easing the transition to abstract measure theory and functional analysis in subsequent 
courses. 

This is an ambitious list of topics for two semesters. In actual practice, several topics 
can safely be left for the interested and ambitious reader to discover independently. 
For example, the sections on completions, equivalent metrics, infinitely differentiable 
functions, equicontinuity, continuity and category, and the Riesz representation theorem 
(among others) could be omitted. 

A few words are in order about the exercises. I included as many as I could manage 
without undermining the text. They come in all shapes and sizes. And, like the text 
itself, there is a fair amount of built-in repetition. But the exercises are intended to be 
part of the presentation, not just a few stray thoughts appended to the end of a chapter. 
For this reason, the exercises are peppered throughout the text; each is placed near what 
I consider to be its natural position in the flow of ideas. 

The beginner is encouraged to at least read through the exercises - those that look 
too difficult at first may seem easier on their third or fourth appearance. And the key 
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ideas come up at least that often. A word of warning to the instructor in this regard: 
Some restraint is needed in assigning certain problems too early. There are occasional 
“sleepers” (deceptively difficult problems), intended to serve more as brainwashing 
than as homework. A veteran will have little trouble spotting them. And a word of 
warning to the student, too: Since the exercises are part of the text, a few important 
notions make their first appearance in an exercise. Be on the lookout for bold type; it’s 
used to highlight key words and will help you spot these important exercises. 

You will notice that certain of the exercises are marked with a small triangle (t>) 
in the margin. For a variety of reasons, I have deemed these exercises important for a 
full understanding of the material. Many are straightforward “computations,” some are 
simple detail checking, and at least a few unveil the germs of ideas essential for later 
developments. Again, a veteran will find it easy to distinguish one from the other. In 
my own experience, the marked exercises provide a reasonable source for assignments 
as well as topics for in-class discussion. 

To encourage independent study (and because I enjoyed doing it), I have included 
a short section of “Notes and Remarks” at the end of each chapter. Here I discuss 
additional or peripheral topics of interest, alternate presentations, and historical com- 
mentary. The references cited here include not only primary sources, both technical and 
historical, but also various secondary sources, such as survey or expository articles. 

A word or two about organization: Exercises are numbered consecutively within 
a given chapter. However, when referring to a given exercise from outside its home 
chapter, a chapter number is also included. Thus, Exercise 14 refers to the fourteenth 
exercise in the current chapter, while Exercise 3.26 refers to the twenty-sixth exercise in 
Chapter Three. The various lemmas, theorems, corollaries, and examples are likewise 
numbered consecutively within a chapter, without regard to label, and always carry 
the number of the chapter where they reside. This means that the lemma immediately 
following Proposition 10.5 is labeled Lemma 10.6, even if it is the first lemma to appear 
in the chapter, and Lemma 10.6 may well be followed by Theorem 10.7, the second 
theorem in the chapter. In any case, all three items appear in Chapter Ten. 

Many people endured this project with me, and quite a few helped along the way. 
I would not have survived the process had it not been for the constant encouragement 
and expert guidance offered by my friends Patrick Flinn and Stephen Dilworth. Equally 
important were my colleagues Steven Seubert and Kit Chan, who graciously agreed to 
field-test the notes, and who patiently entertained endless discussions of minutiae. Of 
course, a large debt of gratitude is also owed to the many students who suffered through 
early versions of these notes. You have them to thank for each passage that “works” 
(and only me to blame for those that don’t). Finally, copious thanks to my wife Cheryl, 
who, with good humor and affection, indulged my musings and maintained my sanity. 


-N. C. 
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CHAPTER ONE 


Calculus Review 


Our goal in this chapter is to provide a quick review of a handful of important ideas from 
advanced calculus (and to encourage a bit of practice on these fundamentals). We will 
make no attempt to be thorough. Our purpose is to set the stage for later generalizations 
and to collect together in one place some of the notation that should already be more 
or less familiar. There are sure to be missing details, unexplained terminology, and 
incomplete proofs. On the other hand, since much of this material will reappear in later 
chapters in a more general setting, you will get to see some of the details more than 
once. In fact, you may find it entertaining to refer to this chapter each time an old name 
is spoken in a new voice. If nothing else, there are plenty of keywords here to assist 
you in looking up any facts that you have forgotten. 


The Real Numbers 

First, let’s agree to use a standard notation for the various familiar sets of numbers. R 
denotes the set of all real numbers; C denotes the set of all complex numbers (although 
our major concern here is R, we will use complex numbers from time to time); Z stands 
for the integers (negative, zero, and positive); N is the set of natural numbers (positive 
integers); and <Q> is the set of rational numbers. We won’t give the set of irrational 
numbers its own symbol; rather we’ll settle for writing R\Q (the set-theoretic difference 
of R and <Q>). 

We will assume most of the basic algebraic and order properties of these sets, but 
we will review a few important ideas. Of greatest importance to us is that the set R of 
real numbers is complete - in more than one sense! First, recall that a subset A of R 
is said to be bounded above if there is some x € R such that a < x for all a € A. Any 
such number x is called an upper bound for A. The real numbers are constructed so 
that any nonempty set with an upper bound has, in fact, a least upper bound (l.u.b.). 
We won’t give the details of this construction; instead we’ll take this property as an 
axiom: 

The Least Upper Bound Axiom (sometimes called the completeness axiom). 

Any nonempty set of real numbers with an upper bound has a least upper bound. 

That is, if A c R is nonempty and bounded above, then there is a number s € R 
satisfying: (i) s is an upper bound for A; and (ii) if * is any upper bound for A, then 
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s < x. In other words, if y < s, then we must have y < a < s for some a € 
A. (Why?) We even have a notation for this: In this case we write s = l.u.b. A = 
sup A (for supremum). If A fails to be bounded above, we set sup A = +oo, and if 
A = 0, we put sup A = -oo since, after all, every real number is an upper bound 
for A. 

Example 1.1 

sup(— oo, 1) = 1 andsup{2 - (1 /n) : n = 1, 2, . . . } = 2. Notice, please, that sup A 

is not necessarily an element of A. 

An immediate consequence of the least upper bound axiom is that we also have 
greatest lower bounds (g.l.b.), just by turning things around. The details are left as 
Exercise 1. 


EXERCISE 

> 1. If A is a nonempty subset of R that is bounded below, show that A has a greatest 
lower bound. That is, show that there is a number m € R satisfying: (i) m is a lower 
bound for A; and (ii) if x is a lower bound for A, then x < m. [Hint: Consider the 
set —A = {—a : a G A) and show that m = — sup(— A) works.] 


We have a notation for greatest lower bounds, too, of course: We write m = g.l.b. A = 
inf A (for infimum). It follows from Exercise 1 that inf A = -sup(-A). Thus, inf A = 
-oo if A isn’t bounded below, and inf 0 = +oo. In case a set A is both bounded above 
and bounded below, we simply say that A is bounded. 


EXERCISES 

2. Let A be a bounded subset of R containing at least two points. Prove: 

(a) —oo < inf A < sup A < +oo. 

(b) If B is a nonempty subset of A, then inf A < inf B < sup B < sup A. 

(c) If B is the set of all upper bounds for A, then B is nonempty, bounded below, 
and inf B = sup A. 

o 3. Establish the following apparently different (but “fancier”) characterization of 
the supremum. Let A be a nonempty subset of R that is bounded above. Prove that 
s = sup A if and only if (i) s is an upper bound for A, and (ii) for every e > 0, there 
is an a € A such that a > s — e. State and prove the corresponding result for the 
infimum of a nonempty subset of R that is bounded below. 

Recall that a sequence (jc„) of real numbers is said to converge to* e R if, for every 
e > 0, there is a positive integer N such that |*„ — * | < e whenever n > N . In this 
case, we call * the limit of the sequence (*„) and write * = lim„_ 0O x„. 

> 4. Let A be a nonempty subset of R that is bounded above. Show that there 
is a sequence (*„) of elements of A that converges to sup A. 
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5. Suppose that a n < b y for all n, and that a = lim^o© a n exists. Show that a < b. 
Conclude that a < sup n a„ = sup{a„ : n € N). 

t> 6. Prove that every convergent sequence of real numbers is bounded. Moreover, if 
(a n ) is convergent, show that inf„ a n < lim rt _oo a n < sup n a„. 


As an application of the least upper bound axiom, we next establish the Archimedean 
property in R. 

Lemma 1.2. If x and y are positive real numbers , then there is some positive integer 
n such that nx > y . 

proof. Suppose that no such n existed; that is, suppose that nx < y for all n e N. 
Then A = {nx : n e N} is bounded above by y, and so s = sup A is finite. Now, 
since s— x <5, we must have some element of A in between, that is, s — x < nx < s 
for some n e N. But then s < (n + l)x. And what’s wrong? Well, since (h + 1 )jc e A, 
we should instead have (n + 1 )x < s . This contradiction tells us that it is unacceptable 
to have nx < y for all n, and so we must have nx > y for some n. □ 

This simple observation does a lot of good: 

Theorem 13. If a and b are real numbers with a <b y then there is a rational re Q 
with a < r < b. 

proof. Since b — a > 0, we may apply Lemma 1 .2 to get a positive integer q 
such that q(b — a) > 1. But if qa and qb differ by more than 1, there must be some 
integer in between. That is, there is some p e Z with qa < p < qb. Thus a < 
p/q < b. □ 


EXERCISES 

7. If a < b, then there is also an irrational jc e R \ Q with a < x < b. [Hint: Find 
an irrational of the form pVl/q.] 

8. Given a < b y show that there are, in fact, infinitely many distinct rationals 
between a and b . The same goes for irrationals, too. 

9. Show that the least upper bound axiom also holds in Z (i.e., each nonempty sub- 
set of Z with an upper bound in Z has a least upper bound in Z), but that it fails to 
hold in Q. 


It follows from Theorem 1.3 that every real number is the limit of a monotone (i.e., 
increasing or decreasing) sequence of rationals (or irrationals). We’ll want to take full 
advantage of this fact, and we’ll see at least one more reason why it’s true. First, though, 
let’s give a formal statement of the property behind it. 
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Theorem 1.4. A monotone , bounded sequence of real numbers converges. 

proof. Let (*„) C R be monotone and bounded. We first suppose that (*„) is 
increasing. Now, since (*„) is bounded, we may set x = sup rt x n (a real number). We 
will show that x = lim n _oc x n . 

Lets > 0. Since* — e < x = sup„ we must have x N > x— e for some N. But 
then, for any n > N, we have * — e < x^ < x n < x. (Why?) That is, |* — jc „| < e 
for all n > N. Consequently, (*„) converges and * = sup„ x n = lim n _^oo 

Finally, if (*„) is decreasing, consider the increasing sequence (— *„). From the 
first part of the proof, (-*„) converges to sup n (-jc„) = -inf* x n . It then follows 
that (*„ ) converges to inf n x n . □ 

In subsequent chapters we will consider certain properties of the real line that may be 
defined either in terms of sequences or in terms of subsets of R. To better appreciate the 
connection between sequences and sets, we will show how Theorem 1 .4 gives a quick proof 
of the nested interval theorem. Later in this chapter we will use the nested interval theorem 
to define a strange and beautiful subset of R called the Cantor set. 

The Nested Interval Theorem 1.5. If (/„) is a sequence of closed, bounded , 
nonempty intervals in R with I\ D h D h D • • • , then H^Li In ^ 0- If tn 
addition , length (/„) — ► 0, then H^Li In contains precisely one point. 

proof. Write /„ = [ a n , b n ]. Then /„ D /„+ 1 means that a n < a n +\ < b n+] < 
b n for all n. Thus, a = lim,,-^ a n = sup w a n and b = limn^oo b n = inf n b n 
both exist (as finite real numbers) and satisfy a < b. (Why?) Thus we must have 
In = [a*b]. Indeed, if x € I„ for all n, then a„ < x < b„ for all n, and 
hence a < x < b. Conversely, if a < x < b> then a n < x < b„ for all n. That is, 
x € /„ for all n. Finally, if b n — a n = length (/„) -> 0, then a = b and so 
/* = {a}. □ 

Examples 1.6 

(a) Please note that it is essential that the intervals used in the nested interval theorem 

be both closed and bounded. Indeed, °°) = 0 P£1|(0» l/n] = 0. 

(b) Suppose that (/„) is a sequence of closed intervals with /„ D I n +\ , for all n and with 

length (/„) 0 as n — ► oo. If fXfLi In = {^}» then any sequence of points (x„), 

with x n € I n for all n, must converge to *. (Why?) 

A sequence of sets (/„) with /„ D / n+ i for all n is often said to be a decreas- 
ing sequence of sets. Thus, the nested interval theorem might be paraphrased by say- 
ing that a decreasing sequence of closed, bounded, nonempty intervals “converges” to a 
nonempty set. In this language, the nested interval theorem is at least reminiscent of the 
fact that a monotone bounded sequence of real numbers is convergent. And with good rea- 
son: The fact that monotone bounded sequences converge is actually equivalent to the 
least upper bound axiom, as is the nested interval theorem. That is, we might just as 
well have assumed the conclusion of either Theorem 1.4 or Theorem 1.5 as an axiom 
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for R and deduced the existence of least upper bounds as a corollary. As evidence, here 
is a proof that the nested interval theorem implies the existence of least upper bounds (this 
is similar in spirit to Bolzano’s original proof): 

Let A be a nonempty subset of R that is bounded above. Specifically, let a\ 6 A and let b\ 
be an upper bound for A. For later reference, set I\ = [a\,b\ ]. Now consider the point *1 = 
(a\+b\)/2> halfway between a\ and fci . If JCi is an upper bound for A, we set /2 = [fli, x\ ]; 
otherwise, there is an element a 2 e A with a 2 > X\. In this case, set l 2 = [ a 2 , b\ ]. In either 
event, l 2 is a closed subinterval of /j of the form [a 2 ,b 2 L where a 2 G A and b 2 is an upper 
bound for A. Moreover, length^) < length (/i)/2. We now start the process all over 
again, using I 2 in place of /j, and obtain a closed subinterval / 3 = [fl 3 , b$ ] C h , where 
a 3 € A and fc 3 is an upper bound for A, with length (/ 3 ) < length (/ 2 )/2 < length (/|)/4. 
By induction, we get a sequence of nested closed intervals I n = [a„, b n ], where a n € A 
and b n is an upper bound for A, with length (/„) < length (I\)/2 n ~ l — > Oasn — ► 00 . The 
single point b € is the least upper bound for A. (Why?) 


EXERCISES 

10. Let a x = y/2 and let = >/2 ~a~ n for n > 1. Show that ( a n ) converges and 
find its limit. [Hint: Show that ( a n ) is increasing and bounded.] 

11. Fix a > 0 and let x\ > yfa. For n > 1, define 



Show that (jc„) converges and that lim,,-.^ x n = yfa. 

12. Suppose that S\ > s 2 > 0 and let 5 n+! = ^(s n -f ^„-i) for n > 2. Show that 
($„) converges. [Hint: Show that ($ 2 / 1 - 1 ) decreases and ($ 2 *) increases.] 

t> 13. Let a n > 0 for all n , and let s n = £" =1 a , . Show that ( s n ) converges if and 
only if ( s n ) is bounded. 

Recall that a sequence of real numbers (x n ) is said to be Cauchy if, for every 
e > 0, there is an integer N > 1 such that |jc„ — jc m | < e whenever n,m > N. 

> 14. Prove that a convergent sequence is Cauchy, and that any Cauchy sequence is 
bounded. 

> 15. Show that a Cauchy sequence with a convergent subsequence actually converges. 

16. 

(a) Why is 0.4999 . . . = 0.5? (Try to give more than one reason.) 

(b) Write 0.234234234 ... as a fraction. 

(c) Precisely which real numbers between 0 and 1 have more than one decimal 
representation? Explain. 


Our second approach to describing the elements of R as limits of sequences of rational 
numbers is to consider decimals. We might as well do this in some generality. 
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Proposition 1.7. Fix an integer p > 2, and let (a n ) be any sequence of integers 
satisfying 0 < a n < p — 1 for all n. Then, o n /p n converges to a number in 
[ 0 , 1 ]. 


proof. Since a n > 0, the partial sums 2^„ =1 ct n /p n are nonnegative and increase 
with N. Thus, to show that the series converges to some number in [ 0, 1 ], we just 
need to show that 1 is an upper bound for the sequence of partial sums. But this is 
easy: 



OO 1 

^ (p-vlL-, r = L 

n=l F 


(Why? What does this say when p = 10?) □ 


Conversely, each x in [ 0, 1 ] can be so represented: 


Proposition 1.8. Let p be an integer, p > 2, and let 0 < Jt < 1. Then there is a 
sequence of integers (< a n ) with 0 < a n < p — 1 for all n such that x = Y1T=\ a *IP n - 


proof. Certainly the case x = 0 causes no real strain, so let us suppose that 
0 < Jt < 1. We will construct ( a n ) by induction. 

Choose a\ to be the largest integer satisfying a\/p < x. (How?) Since x > 0, 
it follows that a\ > 0; and since x < 1, we have a\ < p. Because a\ is an in- 
teger, this means that a\ < p — 1. Also, since a\ is largest, we must have a\/p < 
x < (a x + 1 )/p. 

Next, choosey to be the largest integer satisfying a \/p + a 2 /p 2 < Jt. Checkthat 
0 < a 2 < p — 1 and that a x /p 4- a 2 / p 2 < x < a x /p -f (a 2 4- 1 )/p 2 . 

By induction we get a sequence of integers ( a n ) with 0 < a n < p — 1 such that 



< x < 



<*n + 1 

r 


Obviously, x = a »/P n • (Why?) □ 


The series a */P n ls called a base p (or p-adic) decimal expansion for x. It is 
sometimes written in the shorter form x = O.a \a 2 a$ • • • (base p). It does not have to be 
unique (even for ordinary base 10 decimals: 0.5 = 0.4999 • • • ). One problem is that our 
construction is designed to produce nonterminating decimal expansions. In the particular 

case where x = a x /p H \-(a n + l)/p n = <7//A for some integer 0 < q < p n y the 

construction will give us a repeating string of p — l’s in the decimal expansion for x since 
\/p n = (P““ 1 )/p*. That is, any such Jt has two distinct base p decimal expansions: 

<*\ . , <*n + 1 tfi , , a n , p — 1 

p p n p p n &i\ p 

We now have several methods for finding a sequence of rationals that increase or decrease 
to a given real number. An application of this fact can be used to define expressions such 
as a x for real exponents x. For example, if a > 1, and if x is any real number, then we 
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set a x = sup{fl r : r e Q, r < x}. We get away with this because a r is well defined and 
increasing for r € Q. 

You may have been tempted to use logarithms or exponentials to define a x , but we would 
need a similar line of reasoning to define, say, e x (or even e itself!), and we would need quite 
a bit more machinery to define log*. As long as we’ve already digressed from decimals, 
let’s construct e . For this we’ll use a simple (but extremely useful) inequality. 

Bernoulli’s Inequality 1.9. If a > - 1, a ^ 0, then (1 + a) n > 1 + na for any 

integer n > 1. 

The proof of Bernoulli’s inequality is left as an exercise. We’ll apply it to prove: 

Proposition 1.10. 

(i) (l + -) is strictly increasing. 

(ii) (l + £) is strictly decreasing. 

(iii) 2 < (1 + })" < (1 + £)" +l < 4. 

(iv) Both sequences converge to the same limit e = lim n _oo (1 + (1 /n)) n , where 
2 < e < 4. 


proof, (i) We need to show that (1 + \/(n + l))' l+, /(l +(\/n)) n > l.Forthiswe 
rewrite and apply Bernoulli’s inequality: 

n+i 


-km; 


2 + 2 « \ 

(n + D 2 ) 


n+1 


-KM'-irh)' 

-KM";i,)- 

(ii) This case is very similar to (i). 

t ^>" +i i i i+i r 
^r 2 " O + il'V+^r/ 

-K)(! 

-KM 


d + 
d + 


(n + 1 ) 2 \ 

' 2 + 2n ) 


(by Bernoulli). 


1 + 


1 


n(n + 2) 


■^n+2 


(4t) ( 1 + 0 = i <b>Berao,,lli) ' 


(iii) Since l+(l/n)>l, we have (1 + (1/n))" < (1 + (l/n))" +l . Since 
(1 + (1/n))" increases, the left-hand side is at least 2 (the first term); and since 
(1 + (l/n))" +l decreases, the right-hand side is at most 4 (the first term). 
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(iv) Finally, we define e = lim„_ oc ( 1 4- (1 /«))", and conclude that 

lim(l + -) = lim (l + -) lim (l + -) = e. □ 

n— oo \ n ) n-HX> \ 11 ) n—oc \ t\ ) 


The same proof applies to the sequence ( 1 + ( x/n))" for any x e R, and we may define 
e x = lim„_ oc (l + (x/rx)) n . The full details of this last conclusion are best left for another 
day. See Exercise 1 8(b). 


EXERCISES 

> 17. Given real numbers a and b , establish the following formulas: \a + b\ < 
|a I + \b\, I |a| — |6| I < \a — b\, max{a, b) = \(a + b + \a - 6|), and min{a, b } = 
i(a + b - \a - b\). 

18. 

(a) Given a > — 1, a ^ 0, use induction to show that (1 > 1 + na for any 

integer n > 1. 

(b) Use (a) to show that, for any jc > 0, the sequence (1 + (x/n)) n increases. 

(c) If a > 0, show that (1 + a) r > I + ra holds for any rational exponent r > 1. 

[Hint: If r = p/q y then apply (a) with n = q and (b) with jc = ap .] 

(d) Finally, show that (c) holds for any real exponent r > 1 . 

19. If 0 < c < 1, show that c" — ► 0; and if c > 0, show that c x/n — > 1. [Hint: 
Use Bernoulli’s inequality for each, once with c = 1/(1 + *), Jt > 0 and once with 
c l/n = 1 -h jc„, where jc n > 0.] 

20. Given a, b > 0, show that >Jab < j(a + b) (this is the arithmetic-geometric 

mean inequality). Generalize this to (a \ ai -a n ) x/n < (\/n)(a\ +fl 2 H Ha*). 

[Hint: Induction and Bernoulli’s inequality.] 

>21. Let p > 2 be a fixed integer, and let 0 < jc < 1 . If jc has a finite-length base p 

decimal expansion, that is, if a: = a\/p-\ Va n j p n with a n ^ 0, prove that jc has 

precisely two base p decimal expansions. Otherwise, show that the base p decimal 
expansion for jc is unique. Characterize the numbers 0 < jc < 1 that have repeating 
base p decimal expansions. How about eventually repeating ? 


As long as we are on the subject of sequences, this is a good time to outline part of 
the master plan! Virtually everything that we need to know about the real line R and 
about functions / : R — ► R can be described in terms of convergent sequences. In- 
deed, a continuous function / : R — > R could be defined as a function that “pre- 
serves” convergent sequences: /(lim^^oo jc„) = lim,,-^ /(jc„). If we hope to under- 
stand continuous functions (and we do!), then it is of great importance to us to know 
precisely which real sequences converge. So far we know that monotone, bounded se- 
quences converge, and that any convei^ent sequence is necessarily bounded. (Why?) 
These two facts together raise the question: Does every bounded sequence converge? 
Of course not. But just how “far” from convergent is a typical bounded sequence? To 
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answer this, we will want to broaden our definition of limit. First a few easy observa- 
tions. 

Let (a n ) be a bounded sequence of real numbers, and consider the sequences: 

t n = inf{n„, fln+i. <W2. • • • } and T n = sup{a„, a„+i , a n+2 , . . . }. 

Then (/„) increases , (T n ) decreases , and inf* a^ < t n < T n < sup* a* for all n. (Why?) 
Thus we may speak of lim,,-^ t n as the “lower limit” and lim,,.^ T n as the “upper limit” 
of our original sequence ( a n ). And that is exactly what we will do. 

Now these same considerations are meaningful even if we start with an unbounded 
sequence ( a n ), although in that case we will have to allow the values ±oo for at least some 
of the t n ' s or T n ' s (possibly both). That is, if we permit comparisons to ±oo, then the s 
still increase and the T n 's still decrease. Of course we will want to use sup n t n and inf„ T n 
in place of lim^^oo t n and lim n _^oo T n> since “sup” and “inf* have more or less obvious 
extensions to subsets of the extended real number system [—00, +00] whereas “lim” does 
not. Even so, we are sure to get caught saying something like “(f„) converges to +00.” But 
we will pay a stiff penalty for too much rigor here; even a simple fact could have a tediously 
long description. For the remainder of this section you are encouraged to interpret words 
such as “limit” and “converges” in this looser sense. 

Given any sequence of real numbers ( a „ ), we define 

lim infer* = Jim a n = sup(inf{«„, a n+i ,a n+2 , . . . }) 

n_,0 ° n— 00 n> I 


and 


lim sup a„ = lim a n = inf (sup{a„, a n+u a n+2 , . . . }). 

n-oo "- 00 

That is, lim infn^oc = sup* t n (=lim„_ oc t„ if (a„) is bounded from below) and 
limsup^^a* = inf„ T„ (=lim„_ oc T„ if(a„) is bounded from above). The name “lim inf” 
is short for “limit inferior,” while “lim sup” is short for “limit superior.” 


EXERCISES 

22 . Show that inf* a„ < lim inf*.,,* a n < lim sup*.^ a„ < sup* a n . 

23 . If (a* ) is convergent, show that lim inf^^a,, = limsup n _ oc a„ = lim n _ 0 o a„. 

> 24 . Show that lim sup n _ 00 (-a„) = —lim inf*.,^ a„. 

> 25 . If lim sup*^ a„ = — oo, show that (a„) diverges to — oo. If lim sup*^.^ a„ = 
+00, show that (a„) has a subsequence that diverges to +oo. What happens if 
lim inffl^oo a„ = ±oo? 


If we start with a bounded sequence (a n ), then 

M = lim sup a* = lim (sup{n* : k > n}) ± ±oo, 
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and hence: 

{ for every e > 0, there is an integer N > 1 such that 
M — s < sup{tf* : k > n) < Af + e for all n > N. 

Thus, the number Af = lim sup n _ 00 a n is characterized by the following: 

| for every e > 0, we have a n < Af + e for all but finitely 
\ many n, and Af — e < a n for infinitely many n. 


EXERCISES 

> 26. Prove the characterization of lim sup given above. That is, given a bounded se- 
quence ( a n ), show that the number Af = lim sup,,^^ a n satisfies (*) and, conversely, 
that any number Af satisfying (*) must equal limsup^^^an. State and prove the 
corresponding result for m = lim inf^oo a n . 

> 27. Prove that every sequence of real numbers (a n ) has a subsequence (a„ k ) that 
converges to lim sup^^ a n . [Hint: If Af = lim sup n-l>00 a„ = ±oo, we must inter- 
pret the conclusion loosely; this case is handled in Exercise 25. If Af ^ ±oo, use (*) 
to choose (i a „ k ) satisfying \a„ k — Af | < 1/ it, for example.] There is necessarily also 
a subsequence that converges to lim inf^^oo a n . Why? 

28. By modifying the argument in the previous exercise, show that every sequence 
of real numbers has a monotone subsequence. 

29. If (a„ k ) is a convergent subsequence of (a„), show that lim inf,,-^ a n < 
lim*^ a„, < lim sup fl _ v00 a„. 

30. If a„ < b„ for all n, and if (a„) converges, show that lim n _ 0O a„ < lim in^..^ 

b„. 

31. If (a B ) is convergent and (b„) is bounded, show that lim (a„ + b„) < 

lim„ -►oo a n + lim sup 

32. Given a sequence (a n ) of real numbers, let S be the set of all limits of conver- 
gent subsequences of (a n ) (including, possibly, ±oo). For example, it follows from 
Exercise 27 that limsup^^a,, and liminfy^oo a n are both elements of S. Show 
that, in fact, lim sup^^ a„ = sup S and lim inf n _>oo a n = inf S. 


The ability to find a convergent subsequence of an arbitrary sequence, as in Exercise 27, 
leads to a whole slew of corollaries. See if you can supply proofs for the following: 

The Bolzano-Weierstrass Theorem 1.11. Every bounded sequence of real numbers 
has a convergent subsequence. 

Corollary 1.12. If ( a n ) is a convergent sequence , then liminfn^oo^ = 
lim sup n ^ 0o a n = lim,,-^,,. 

Corollary 1.13. Every Cauchy sequence of real numbers converges. 
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Corollary 1.14. Every bounded sequence of real numbers has a Cauchy subse- 
quence. 

[Hint: See Exercises 14 and 15 for more on Cauchy sequences.] 

Finally, we come full circle: 

Proposition 1.15. If ( a n ) is bounded, and if lim a n = lim sup,,^^ a n , then 

(a n ) converges and lim*-^ a„ = lim sup^^ a n . 

proof. Let a = lim inf*.^ a n = lim sup^^^ a n , and let e > 0. From our char- 
acterizations of lim inf and lim sup, there is an N\ > 1 such that a — e < a n for all 
n > N\ (since a = liminf^^oo a „ ), and there is an N 2 > 1 such that a n < a + e 
for all n > Ni (since a = limsup^^^^). Thus, for n > max{Ni, Nf\ we have 
I a-a n \ < e. □ 

You may recall that a sequence of real numbers converges if and only if it is Cauchy. 
Although one approach to this fact has already been suggested in the exercises, it is such 
an important property of the real numbers that it is well worth the effort to give a second 
proof! 

First recall that if a sequence converges, then it is Cauchy; and if a sequence is Cauchy, 
then it is also bounded. (See the exercises for more details.) We want to reverse the first 
implication, and so we may assume that we have a bounded sequence to start with. This 
helps, since for a bounded sequence ( a n ) both lim sup,,.^ a n and lim inf,,-*** a n are (finite) 
real numbers. Given a Cauchy sequence, then, we only need to check that these two numbers 
are equal, which is easier than it might sound. 

Theorem 1.16. A sequence of real numbers converges if (and only if) it is Cauchy. 

proof. Let (fl„) be Cauchy, and let e > 0. Choose N > 1 such that \a„ — a m \ < e 
for all m, n > N . Then, in particular, we have a^ —e < a„ < as + s for all n > N\ 
thus, (a„) is bounded. But a^ — £ < a n for n > N implies that a^ — e < lim inf a ny 
while a n < a N + e for n > N implies that limsup n _ >00 a n < a N + e. (Why?) 
Since —00 < lim inf a n < lim sup^^ a n < 00 , we may subtract these results and 
conclude that limsup^^a,, — liminfn^oofln < 2e. Since e > 0 is arbitrary, we 
get that lim sup^^ a n = lim inf^^ a n . □ 


EXERCISES 

> 33. Show that (jc„) converges to x € R if and only if every subsequence (jc„ 4 ) of 
( x n ) has a further subsequence ( x „ k( ) that converges to x. 

34. Suppose that a n > 0 and that a n < 00 . 

(I) Show that lim inf n _^oo na„ = 0. 

(il) Give an example showing that lim sup n _„ 00 na n > 0 is possible. 



14 


Calculus Review 


35. (The ratio test): Let a n > 0. 

(i) Iflimsup n _ >oc a n+ ,/a n < 1 , show that a n < oo. 

(ii) If liminf n _. 00 a, I+ |/a n > I, show that a n diverges. 

(iii) Find examples of both a convergent and a divergent series having 

36. (The root test): Let a n > 0. 

(i) If limsupn^^ Z/cT n < 1, show that a„ < oo. 

(ii) If liminf /1 _, 0o > 1 , show that a n diverges. 

(Iii) Find examples of both a convergent and a divergent series having 

limn—too = 1. 

> 37. If ( E n ) is a sequence of subsets of a fixed set 5, we define 


lim sup E„ = 

n-+o o 



Show that 


and lim inf E„ 

n-*oo 



lim inf E„ C lim sup E„ and that 


lim in 

n-*>oo 



lim sup E„ 

n-+ oo 


) 


c 


38. Show that 


lim sup E n = {x 6 S : x e £„ for infinitely many «} 

n-*oo 

and that 


lim inf E n = [x e S : x € E n for all but finitely many n). 

n— ►OC 

39. How would you define the limit (if it exists) of a sequence of sets? What should 
the limit be if E\ D £ 2 D • ? If E\ C E 2 C • • • ? Compute lim inf n _oo E n and 
lim sup^^^ E„ in both cases and test your conjecture. 


Limits and Continuity 

In this section we present a brief refresher course on limits and continuity for real-valued 
functions. With any luck, much of what we have to say will be very familiar. To begin, 
let / be a real-valued function defined (at least) for all points in some open interval con- 
taining the point a e R except, possibly, at a itself. We will refer to such a set as a 
punctured neighborhood of a . Given a number L € R, we write l\m x ^ a f(x) = L to 
mean: 

{ for every € > 0, there is some S > 0 such that \f(x) — L\ < e 
whenever jc satisfies 0 < \x — a\ < S. 

We say that lim*^ /(jc) exists if there is some number L e R that satisfies the requirements 
spelled out above. The proof of our first result is left as an exercise. 
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Theorem 1.17. Let f be a real-valued function defined in some punctured neighbor- 
hood of a € R. Then , the following are equivalent: 

(i) There exists a number L such that lim x _* a f(x) = L (by the e-S definition). 

(ii) There exists a number L such that f(x„) — ► L whenever x n — ► a, where 
x „ ^ a for all n. 

(iii) (f(x n )) converges (to something) whenever x n — > a , where x„ ^ a for all n. 

The point to item (iii) is that if lim^oo f(x n ) always exists, then it must actually be 
independent of the choice of (*„). This is not as mystical as it might sound; indeed, if 
x„ — ► a and y n -> a , then the sequence jci , y \ , jc 2 . y 2 > • • . also converges to a. (How does 
this help?) This particular phrasing is interesting because it does not refer to L. That is, we 
can test for the existence of a limit without knowing its value. 

Now suppose that / is defined in a neighborhood of a, this time including the point a 
itself. We say that / is continuous at a if lim*-*,, f(x) = f (a). That is, if: 

{ for every e > 0, there is a S > 0 (that depends on /, a y and e ) 
such that \f(x) — f(a) | < e whenever x satisfies \x — a\ < 8. 

Notice that we replaced L by / (a) and we dropped the requirement that x ^ a. Theo- 
rem 1.17 has an obvious extension to this case (and its proof is also left as an exercise). 

Theorem 1.18. Let f be a real-valued function defined in some neighborhood of 
a e R. Then , the following are equivalent: 

(i) / is continuous at a (by the £-8 definition ); 

(ii) f(*n) / (a) whenever x n -* a: 

(iii) (f(x n )) converges (to something) whenever x n -* a. 

Notice that we dropped the requirement that x n ^ a. Thus, if lim n _,oo / (jc„) always exists, 
then it must equal f(a). (Why?) 

You might also recall that we have a notation for left- and right-hand limits and left and 
right continuity. For example, if we define 

f(a-) = lim f(x) and f(a+) = lim f(x) 

x—*a~ x-*a+ 

(provided that these limits exist, of course), then we could add another equivalence to 
Theorem 1.18: 

1.18. (iv) f(a — ) and f(a+) both exist ; and both are equal to f(a). 

One-sided limits are peculiar to functions defined on R, and they do not generalize very 
well (because they are tied to the order in R ). But they are very good at what they do: They 
permit the cataloguing of very refined types of discontinuities. For example, we say that / 
is right-continuous at a if f(a+) exists and equals /(a), and we say that / has a jump 
discontinuity at a if f(a—) and / (a+) both exist but at least one is different from / (a). 
A function having only jump discontinuities is not so very bad. In particular, monotone 
functions are rather well behaved: 
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Proposition 1.19. Let f : (a, b) -* R be monotone and let a < c < b. Then, 
f (c — ) and /(c+) both exist . Thus, f can have only jump discontinuities . 

proof. We might as well suppose that / is increasing (otherwise, consider — / ). 
In that case, f(c) is an upper bound for {/(/) : a < t < c) and a lower bound for 
{/(/) : c < t < b}. All that remains is to check that sup [f(t) : a < t < c} = 
lim*.^- /( jc) andinf{/(/) : c < t < b) = lim*_> c + /( jc). We will sketch the proof 
of the first of these. 

Given e > 0, there is some jc 0 with a < Xo < c such that sup, <c /(/) — e < 
f(x o) < sup /<f /(/). Now let S = c — jc 0 > 0. Then, if c — 8 < x < c, we get < 
x < c y and so f(x Q ) < f(x) < sup, <r /(/). Thus, | f(x) - sup f<c f(t)\ < e. □ 


EXERCISES 

40. Prove Theorem 1.17. 

41. Prove Theorem 1.18, including 1.18 (iv) as one of the equivalent conditions. 

42. Given R and xe(a y b)> consider the statements: (i) lim^o 

I f(x + h) - /(*)| = 0 and (ii) l\m h ^ 0 \f(x + h) - f(x - h)\ = 0. Show 
that (i) always implies (ii). Give an example where (ii) holds but not (i). 

43. Modify Theorem 1.17 to characterize the statement lim Jt _ >fl + f(x) = L, and 
check your new version by providing a proof! 

44. If / : R — ► R is increasing and bounded, show that lim^oo f(x) and 
lim x ^_oo /(jc) both exist. 

> 45. Let / : [ a, b ] — ► R be continuous and suppose that f(x) = 0 whenever jc is 
rational. Show that /( jc) = 0 for every x in [ a, b ]. 

t> 46. Let / : R -* R be continuous. 

(a) If /( 0) > 0, show that /(jc) > 0 for all jc in some open interval (—a, a). 

(b) If /(jc) > 0 for every rational jc, show that /(jc) > 0 for all real jc. Will this 
result hold with “>0” replaced by “>0”? Explain. 

47. Let /, gy hy and k be defined on [ 0, 1 ] as follows: 


/(*) = 

|0 if x $ Q 
{ 1 if x 6 Q 

h(x) = 

\l-x 

i* 

if x $ Q 
if x e Q 

£(*) = • 

0 if x <Q> 

x if X € Q 

k(x)= ■ 

0 

1 In 

ifx&Q 

if x = m /n € Q 
(in lowest terms). 


Prove that / is not continuous at any point in [0, 1 ], that g is continuous only at 
jc = 0, that h is continuous only at jc = 1/2, and that k is continuous only at the 
irrational points in [ 0, 1 ]. 

48. Give an example of a one-to-one, onto function / : [ 0, 1 ] -> [ 0, 1 ] that is 
not monotone. Can you find a monotone, one-to-one function that is not onto? Or a 
monotone, onto function that is not one-to-one? 
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49. Let / : (a, b) —y R be monotone and let a < x < b. Show that / is continuous 
at x if and only if /( x—) = f(x+). 

50 . Let D denote the set of rationals in [ 0, 1 ] and suppose that / : D —*■ R 
is increasing. Show that there is an increasing function g : [ 0, 1 ] —r K. such that 
g(x) = f(x) whenever jt is rational. [Hint: For x € [0, 1 ], define g(x) = sup{/(/) : 
0 < t < 1, t 6 Q}.] 

51. Let / : [a, b] -*■ R be increasing and define g : [a, b] -*■ R by g(x) = 
/( jc+) for a < x < b and g(b) = f(b). Prove that g is increasing and right- 
continuous. 

0 


Notes and Remarks 

Although we cannot claim to have reviewed every last detail that you might need for an 
untroubled reading of these pages, we have managed to at least recall several important 
issues. Battle [1964] and Fulks [1969] are good sources for a review of advanced calculus; 
Apostol [197S] and Stromberg [1981] are good sources for further details on the topics 
discussed in this chapter. 

Full details of the construction of the real numbers “from scratch" can be found in 
Birkhoff and MacLane [1965], Goffman [1953a], Hewitt and Stromberg [1965], and 
Sprecher [1970]. For more on the various equivalent notions of completeness for the real 
numbers, see the aptly titled article “Completeness of the real numbers” in Goffman [1974]. 
For more on the history of rigorous analysis, see Boyer [1968], Edwards [1979], Grabiner 
[1983], Grattan-Guinness [1970], Kitcher [1983], Kleiner [1989], and Kline [1972], As 
an interesting tidbit in this vein, Dudley [1989] points out that no proof of the so-called 
Bolzano-Weierstrass theorem (Corollary 1 . 1 1 ) has ever been found among Bolzano’s writ- 
ings. For a curious observation about real numbers with “ambiguous” decimal representa- 
tions, see PetkovSek [1990], 

Exercise 42 is taken from Apostol [1975]. 
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Countable and Uncountable Sets 


Equivalence and Cardinality 

We have seen that the rational numbers are densely distributed on the real line in the 
sense that there is always a rational between any two distinct real numbers. But even 
more is true. In fact, it follows that there must be infinitely many rational numbers 
between any two distinct reals. (Why?) In sharp contrast to this picture of the rationals 
as a “dense” set, we will show in this section that the rational numbers are actually 
rather sparsely represented among the real numbers. We will do so by “counting” the 
rationals! 

We say that two sets A and B are equivalent if there is a one-to-one correspondence 
between them. That is, A and B are equivalent if there exists some function / : A -*■ B 
that is both one-to-one and onto. As a quick example, you might recall from calculus that 
the map x i-> arctan at is a strictly increasing (hence one-to-one) function from R onto 
the open interval (-7r/2, n/2). Thus, R is equivalent to ( — 7r/2, n/2). For convenience 
we may occasionally write A ~ B in place of the phrase “ A is equivalent to B." Please 
note that the relation “is equivalent to” is an equivalence relation. 

The notion of equivalence is supposed to lead us to a notion of the relative sizes of sets. 
Equivalent sets should, by rights, have the same “number” of elements. For this reason 
we sometimes say that equivalent sets have the same cardinality. (A cardinal number 
is a number that indicates size without regard to order; we will have more to say about 
cardinal numbers later.) We put this to immediate use; A set A is called finite if A = 0 

or if A is equivalent to the set { 1 , 2 n ) for some n e N; otherwise, we say that A is 

infinite. It follows that an infinite set must contain finite subsets of all orders. (Why?) 

An infinite set A is said to be countable (or countably infinite) if A is equivalent to 
N. That is, the elements of a countable set A can be enumerated, or counted, according 
to their correspondence with the natural numbers: A = (xi . X 2 , xj , . . .}, where the x t 
are distinct. Note that this is not quite the same as a sequence. Here A is the range of 
a one-to-one function / : N -*• A and we are simply displaying the elements of A in 
the order inherited from N; that is, A = {/(l), /( 2), . . .}. Let us look at a few specific 
examples. 

Examples 2.1 

(a) Z ~ N. To see this, define / : Z -*• N by /(«) = 2n if n > 1 and f(n) = —2 n + 1 
if n < 0. The positive integers in Z are mapped to the even numbers in N, while 
0 and the negative integers in Z are mapped to the odd numbers in N. That / is 
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both one-to-one and onto is easy to check. Notice, please, that Z is equivalent 
to a proper subset of itself! This is typical of infinite sets. 

(b) N x N ~ N. A quick proof is supplied by the fundamental theorem of arithmetic: 
Each positive integer k e N can be uniquely written as k = 2 m_1 (2 n - 1) for 
some m, n e N. (Factor out the largest power of 2 from k and what remains 
is necessarily an odd number.) Here is our map: Define / : N x N -*■ N by 
f(m, n) = 2 m_1 (2 n - 1). That / is both one-to-one and onto is obvious. We will 
give a second proof shortly. 

In actual practice it makes life easier if we simply lump finite and countably infinite 
sets together under the heading of countable sets or, to be precise, at-most-countable 
sets. After all, the elements of a finite set can surely be counted. The easiest way to 
perform this consolidation is by modifying our definition of a countable set. Henceforth, 
we will say that a countable set is one that is equivalent to some subset of N. This 
obviously now includes finite sets, but does it include any new, inappropriate sets? To 
see that this gives us just what we wanted, we prove: 

Lemma 2.2. An infinite subset of N is countable; that is, if A c N and if A is 
infinite, then A is equivalent to N. 

proof. Recall that N is well ordered. That is, each nonempty subset of N has a 
smallest element. Thus, since A ± 0, there is a smallest element jc, e A. Then 
A\{ati} ^ 0, and there must be a smallest x 2 € A \(*i}. But now \ , *2} ^ 0 , 

and so we continue, setting *3 = min(A \ {jci , jc 2 >). By induction we can find 
JCj, *2. *3, ••• , x„, ... € A, where .r„ = min(A \ {*1 jc„_i }). 

How do we know that this process exhausts Al Well, suppose that x e A\ 
{*i,jt2 , . . .} 0 0. Then the set [k : xi, > jc } must be nonempty (otherwise we 
would have x e A and x < x\ = min A), and hence it has a least element. That is, 
there is some n with x\ < ■ ■ ■ < x n -i < x < x„. But this contradicts the choice of 
x„ as the first element in A \ Ui Consequently, A is countable. □ 

It follows from Lemma 2.2 that a subset of N is either finite or is infinite and 
equivalent to N. Please be forewarned: Not all authors agree with the convention that 
we have adopted. We have chosen to group finite and countably infinite sets together 
under the heading of countable sets to avoid the nuisance of providing two separate 
statements for each of our results. 

The proof of Lemma 2.2 shows that an infinite subset 5 of N can be written as a 
strictly increasing subsequence of N; that is, S = {«i < /12 < /13 < • • •}. This, together 
with the order properties of the real line R, make short work of finding monotone 
subsequences. 

Theorem 2.3. Every sequence of real numbers has a monotone subsequence. 

proof. Given a sequence ( a „ ), let S = [n : a m > a n for all m > n}. If S is infinite, 
with elements n 1 < /12 < nj < • • • , then a„, < a„, < a„, < • • • is a (strictly) 
increasing subsequence. 
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If, on the other hand, 5 is finite, then N \ S is a nonempty subset of N. Thus, 
there is a least elemental € N\5suchthatn £ Sforalln > ni.Sinceni £ 5,there 
is some n 2 > n\ such that a„ 2 < a ni . But n 2 i S, and so there is some n 2 > n 2 
such that a ni < a ni . And so on. Thus, a ni > a„ 2 > a„ 3 > ••• is a decreasing 
subsequence. □ 

We cannot pass up a chance to drop a few names: 

Corollary 2.4. (The Bolzano- Weierstrass Theorem) Every bounded sequence 
of real numbers has a convergent subsequence. 

Corollary 2.5. Every Cauchy sequence of real numbers converges. 


EXERCISES 

1. Check that the relation “is equivalent to” defines an equivalence relation. That 
is, show that (i) A ~ A, (ii) A ^ B if and only if B ~ A, and (iii) if A ^ B and 
B ^ C, then A ^ C. 

2. If A is an infinite set, prove that A contains a subset of size n for any n > 1. 

3. Given finitely many countable sets Aj, . . . , A„, show that A\ U • • • U A n and 
A i x • • • x A„ are countable sets. 

> 4. Show that any infinite set has a countably infinite subset. 

5. Prove that a set is infinite if and only if it is equivalent to a proper subset of itself. 
[Hint: If A is infinite and x e A, show that A is equivalent to A \ {jc}.] 

o 6. If A is infinite and B is countable, show that A and A U B are equivalent. [Hint: 
No containment relation between A and B is assumed here.] 

7. Let A be countable. If / : A — ► B is onto, show that B is countable; if g : C — ► 
A is one-to-one, show that C is countable. [Hint: Be careful!] 

8. Show that (0, 1) is equivalent to [ 0, 1 ] and to R. 

9. Show that (0, 1 ) is equivalent to the unit square (0, 1 ) x (0, 1 ). [Hint: “Interlace” 
decimals - but carefully!] 

10. Prove that (0, 1) can be put into one-to-one correspondence with the set of all 
functions f : N — > {0, 1). 


To motivate our next several results, we present a second proof that N x N is equiva- 
lent to N. We begin by arranging the elements of N x N in a matrix (see Figure 2.1). 

The arrows have been added to show how we are going to enumerate N x N. We will 
count the pairs in the order indicated by the arrows: (1, 1), (2, 1), (1, 2), (3, 1), (2, 2), 
and so on, accounting for each upward slanting diagonal in succession. 

Notice that all of the pairs along a given diagonal have the same sum. The entries of 
(1,1) add to 2, the entries of both (2, 1 ) and (1,2) add to 3, each pair of entries on the 
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(1,1) 


(1,2) 


(1,3) (1,4) 





/ 

(2,1) 


(2,2) 


(2,3) 






(3,1) 


(3,2) 








(4,1) 






next diagonal add to 4, and so on. Moreover, for any given n, there are exactly n pairs 
whose entries sum to n + 1 . Said in other words, there are exactly n pairs on the nth 
diagonal. Based on these observations, it is possible to give an explicit formula for this 
correspondence between N and N x N. We leave the details as Exercise 1 1. 

Now the fact that N x N ~ N actually gives us a ton of new information. For example: 

Theorem 2.6. The countable union of countable sets is countable ; that is, if A, 
is countable for i = 1 , 2, 3, ... , then (J“ , A, is countable. 

proof. Since each A, is countable, we can arrange their elements collectively in 
a matrix: 


A, : 

<*l.l 

<* 1,2 

01.3 

A 2 : 

02.1 

<* 2,2 

02.3 

A 3 : 

03.1 

<* 3.2 

03.3 


and so IX i is the range of a map on N x N. (How?) That is, (J“, A; is 

equivalent to a subset of N x N and hence to a subset of N. □ 

Corollary 2.7. Q is countable. (Why?) 

Example 2.8 

While we are at it, let us make an observation about decimals. Given an integer 
p > 2, recall that the real numbers having a nonunique base p decimal expansion 

are of the form a/p n , where a e Z and n = 0, 1, 2 Thus, only countably 

many reals have a nonunique base p decimal expansion. (Why?) In fact, because 
there are only countably many bases p to consider, the set of real numbers having 
a nonunique decimal expansion relative to some base is still a countable set. 


EXERCISES 

II. Here is an explicit correspondence between N x N and N (based on the “di- 
agonal” argument preceding Corollary 2.6). Let a i = 0, and for n — 2, 3, . . . , let 
a„ = 53"= / i = n ( n ~ 0/2. Show that the correspondence (m, n) a m+ „_i + n, 
from N x N to N, is both one-to-one and onto. Said in another way, show that the 
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map m (a n — m + 1, m — a n - 0, where n is chosen so that a„-\ < m < a„, 
defines a one-to-one correspondence from N onto N x N. 

12. Given an integer p > 2, “count” the real numbers in (0, 1 ) that have an even- 
tually repeating base p decimal expansion. 

> 13. Show that N contains infinitely many pairwise disjoint infinite subsets. 

14. Prove that any infinite set can be written as the countably infinite union of 
pairwise disjoint infinite subsets. 

> 15. Show that any collection of pairwise disjoint, nonempty open intervals in R is 
at most countable. [Hint: Each one contains a rational!] 

16. The algebraic numbers are those real or complex numbers that are the roots of 
polynomials having integer coefficients. Prove that the set of algebraic numbers is 
countable. [Hint: First show that the set of polynomials having integer coefficients is 
countable.] 


Any infinite set that is not countable is called uncountable, for obvious reasons. 
Countably infinite sets are considered “small” infinite sets, while uncountable sets are 
“big” infinite sets (see the exercises for more on this). From this point of view, Q is 
“small” relative to R: 

Theorem 2.9. R is uncountable. 

proof. To begin, first note that it is enough to show that R has an uncountable 
subset. (Why?) Thus, it is enough to show that (0, 1 ) is uncountable. To accomplish 
this we will show that any countable subset of (0, 1) is proper. 

Given any sequence ( a n ) in (0, 1), we construct an element x in (0, 1) with 
jc t £ a n for any n. We begin by listing the decimal expansions of the a n \ for 
example: 

a\ = 0 . m 1 5 7 2... 

a 2 = 0.00 2 6 8 ... 

03=0.9 1 [T| 3 6 . . . 

o 4 = 0.7 5 9 0 9... 

(If any a„ has two representations, just include both - the resulting list is still 
countable.) 

Now let x = 0.533353 .... where the nth digit in the expansion for x is taken 
to be 3, unless o„ happens to have 3 as its nth digit, in which case we take 5. (This 
is why we highlighted the nth digit in the expansion of a„. The choices 3 and 5 
are more or less arbitrary here - we just want to avoid the troublesome digits 0 
and 9.) Then, the decimal representation of x is unique because it does not end in 
all 0s or all 9s, and x ^ a„ for any n because the decimal expansions for x and a„ 
differ in the nth place. Thus we have shown that ( a„ ) is a proper subset of (0, 1) 
and hence that (0, 1 ) is uncountable. □ 


Corollary 2.10. R \ Q, the set of irrational numbers, is uncountable. (Why?) 



Equivalence and Cardinality 


23 


Examples 2.11 

(a) Returning to an earlier observation, recall that the set of real numbers having 
a nonunique decimal expansion relative to some base is a countable set. Thus, 
“most” real numbers have a unique decimal expansion relative to every base! 

(b) A real number that is not algebraic is called transcendental. It follows from 
Exercise 16 that “most” real numbers are transcendental, although it is not at all 
clear how we would find even one such number! This example demonstrates 
the curious power of cardinality in existential arguments. Other notions of “big” 
versus “small” sets will lend themselves equally well to similar sorts of existence 
proofs. We will repeat this theme several times before we are finished. 


EXERCISES 

17. If A is uncountable and B is countable, show that A and A \ B are equivalent. 
In particular, conclude that A \ B is uncountable. 

18. Show that the set of all real numbers in the interval (0,1) whose base 1 0 decimal 
expansion contains no 3s or 7s is uncountable. 

19. Show that the set of all functions / : A -*■ {0, 1) is equivalent to V(A). the 
power set of A (i.e., the set of all subsets of A). 

20. Prove that N contains uncountably many infinite subsets (N a ) ae r such that 
N„ fl N ft is finite if a ^ p. (This one’s hard!) 


Here is what we have so far: A countably infinite set is small in the sense that every 
subset is either finite or else the same “size” as the whole set. An uncountable set, on 
the other hand, is certainly bigger than any countable set because a countable subset 
of an uncountable set is necessarily proper. From this point of view, countably infinite 
sets are the “smallest” infinite sets; a “smaller” subset of a countably infinite set must 
be finite. But while there is a “smallest infinity,” there is no largest - we can always 
build bigger and bigger sets. 

Given a set A, we write V(A) for the power set of A - the set of all subsets of A. 
Now A is clearly equivalent to a subset of V(A) (namely, the collection of all singletons 
{a}, where a e A) but, as it happens, V{A) is always “bigger” than A: 

Cantor’s Theorem 2.12. No map F : A -* V(A) can be onto. 

proof. Given F : A -*■ V(A), consider B = (x € A : x $ F(jc)} e V(A). We 
claim that B # F(y) for any y e A. Indeed, if B = F(y), then we are faced with 
the following alternatives: 

y € F(y) = B y i F(y) = B 

or 

=> y i F(y) => y € F(y), 

and both lead to contradictions! □ 
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While we won’t take the time to fully justify the notation, each set has a cardinal 
number assigned to it, written card(.A) and read “the cardinality of A," that uniquely 
specifies the number of elements of A. For finite sets the cardinality is literally the 
number of elements, as in card{l, . . . ,n } = n. For countably infinite sets we use the 
cardinal K 0 (read “aleph-nought”), as in card(N) = K 0 . And for R we write card(R) = c 
(for “continuum”). 

We will not pursue this notation much further, but it does provide a convenient 
shorthand and can actually clarify certain arguments. For example, we might write 
card(/4) = card(fi) to mean that the sets A and B are equivalent. And we might use the 
formula card(A) < card(fl) to mean that there is a one-to-one map / : A -*■ B from A 
into B. (Why is this a good choice?) But this raises the question of whether the order 
that we have imposed on cardinal numbers is reasonable. In other words, if card(i4) < 
card(fi) and card(fi) < card(/4) both hold, is it the case that card(A) = card(B)? The 
answer is “yes” and is given in the following celebrated theorem. 

F. Bernstein’s Theorem 2.13. Let A and B be nonempty sets. If there exist a 
one-to-one map f : A -* B, from A into B, and a one-to-one map g : B -*■ A, 
from B into A, then there is a map h : A -*■ B that is both one-to-one and onto. 


proof. First, consider Figure 2.2. We would like to find a subset S of A so that 


A 


B 


9(B\HS))< 


Figure 

2.2 



we may define h to be f on S and g~ { on A \ S. As the figure suggests, for this to 
work we will need a subset S satisfying g (B \ /(S)) = A \ S. To this end, define 
a map H : V(A) -*■ V(A) by 

H(S) = A\g(B\f(S)). 

In this notation, the problem is to find a “fixed point” for H , that is, a set S such 
that H(S) = S. 

Claim. H is “increasing”; that is, S C T => H(S) c H(T). (Just check.) 

Now to see that H must fix some set, let C = {S c A : S c H(S)}, and let 
S = (J C. (S is the least upper bound of the sets with S c H(S). We do not exclude 
the possibility that C = <f> here; in that case we take S = 0.) We will show that 
H(S) = S. 

First, Sc H(S). Indeed, because S C S for all S 6 C, we have S c H(S) c 
H(S ) for all S € C and hence Sc H(S). 

It now follows that H(S ) C H (H(S )). That is, H(S ) e C and hence H( S ) C 
S. Consequently, H(S) = S. □ 
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What we have actually been doing in this section is developing an “arithmetic” for 
cardinal numbers. For example, it turns out that card(A x B) — card( A) • card(fi), which 
works just as you would suspect for finite sets. For infinite sets A and B, we instead use 
the equation to define the product of cardinal numbers. For instance. Example 2.1 (b) 
tells us that Ko • No = No- How might you justify the formula: c • N 0 = c? 

A few more examples will help to explain this “arithmetic” with cardinal numbers. 

Examples 2.14 

(a) The collection of all sequences of Os and Is is uncountable. How so? Well, 
if (a„) is a sequence of Os and Is, then a n/2" represents an element of 
[ 0, 1 ] and, conversely, each element of [ 0, 1 ] can be so represented. That is, the 
map (a„) *-> O.a 102^3 • • • (base 2) is onto. Hence the set of all 0-1 sequences, 
written {0, 1} N , has cardinality at least that of [0, 1 ]. But, in fact, the two sets 
are equivalent. (Why?) 

(b) We next note that the set of all 0-1 sequences is equivalent to ^(N). This is easy: 
If A c N, we define a sequence ( a „ ) by a„ = 1 if n e A and a n = 0 if n $ A. 
The correspondence A k (a„) is clearly both one-to-one and onto. 

With the help of these two examples, we can make a rather fanciful calculation: 

c = card ( [ 0, 1 ] ) = card (P(N)) = card ({0, 1 } N ) = 2 cardN = 2*°. 

Here we used a variation on the formula card(A x B) = card(A) • card(fl), namely, 
card(A B ) = card(A) card(B) . 

Occasionally it is convenient to use a shorthand for certain sets that mirrors their 
cardinality. For example, if we use “2” as a shorthand for the two-point set { 0 , 1 }, then 
we might write 2 N in place of ?>(N), or, more generally, 2 A in place of V(A). Along 
similar lines, we can prove that R 00 , the collection of all real sequences, has the same 
cardinality as R. Of course, R°° is the same as R N , the product of countably many copies 
of R, and so 


card (R n ) = c*> = (2* f° = 2 K °«° = 2*° = c. 


The Cantor Set 

We next examine an intriguing and unusual subset of R called the Cantor set (or, 
sometimes. Cantor’s ternary set). Our investigations here should provide us with a 
natural lead-in to several of the topics that are ahead of us. We will construct an 
uncountable (hence “large”) subset of [ 0, 1 ] that is somehow also “meager.” We begin 
by applying the nested interval theorem to a particular batch of intervals. 

Consider the process of successively removing “middle thirds” from the interval 
[ 0, 1 ] (Figure 2.3). 

We continue this process inductively. At the nth stage we construct /„ from 1 by 
removing 2" _l disjoint, open, “middle thirds” intervals from /„_ 1 , each of length 3 -n ; 
we will call this discarded set J„ . Thus, /„ is the union of 2" closed subintervals of /„_ 1 , 
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Figure 

2.3 


Figure 

2.4 


/ 0 


0 


1 


remove J\ = ( 5 , |) 


h 


0 


1 

3 


remove J 2 = ( 5 , §) U (|, |) 



2 1 
9 3 


remove four “middle third” 

| 1 intervals, each of length ^ 


and the complement of /„ in [ 0, 1 1 is J\ U • • • U J„. The Cantor set A is defined as the 
set of points that still remain at the end of this process, in other words, the “limit” of 
the sets /„. More precisely, A = fj*=i h follows from the nested interval theorem 
that A / 0, but notice that A is at least countably infinite. The endpoints of each /„ are 
in A: 

0. 1, 1/3, 2/3, 1/9, 2/9, ... e A. 

We will refer to these points as the endpoints of A, that is, all of the points in A of the 
form a/3" for some integers a and n. 

As we shall see presently, A is actually uncountable! This is more than a little 
surprising. Just try to imagine how terribly sparse the next few levels of the “middle 
thirds” diagram would look on the page. Adding even a few more levels defies the limits 
of typesetting! For good measure we will give two proofs that A is uncountable, the 
first being somewhat combinatorial. 

Notice that each subinterval of /„_i results in two subintervals of /„ (after discarding 
a middle third). We label these two new intervals L and R (for left and right) as in 
Figure 2.4. 


R 

L R 

L R L R 


I 0 
h 
h 
h 


R 


L R 


L R 


As we progress down through the levels of the diagram toward the Cantor set (some- 
where far below), imagine that we “step down” from one level to the next by repeatedly 
choosing either a step to the left (landing on an L interval in the next level below) or a 
step to the right (landing on an R interval). At each stage we are only allowed to step 
down to a subinterval of the interval we are presently on - jumping across “gaps” is 
not allowed! Thus, each string of choices, LRLRRLLRLLLR .... describes a unique 
“path” from the top level lo down to the bottom level A. The Cantor set, then, is quite 
literally the “dust” at the end of the trail. Said another way, each such “path” determines 
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a unique sequence of nested subintervals, one from each level, whose intersection is a 
single point of A. 

Conversely, each point x e A lies at the end of exactly one such path, because at any 
given level there is only one possible subinterval of /„ on our diagram, call it /„, that 
contains x. The resulting sequence of intervals (/„) is clearly nested. (Why?) Thus, the 
Cantor set A is in one-to-one correspondence with the set of all paths, that is, the set of 
all sequences of Ls and Rs. Of course, any two choices would have done just as well, 
so we might also say that A is equivalent to the set of all sequences of Os and ls - a set 
we already know to be uncountable. Here is what this means: 

card( A) = card ( 2 N ) = card ( [ 0, 1 ] ) . 

Absolutely amazing! The Cantor set is just as “big” as [ 0, 1 ] and yet it strains the 
imagination to picture such a sparse set of points. 

Before we give our second proof that A is uncountable, let’s see why A is “small” 
(in at least one sense). We will show that A has “measure zero”; that is, the “measure” 
or “total length” of all of the intervals in its complement [ 0, 1 ] \ A is 1 . Here’s why: 
By induction, the total length of the 2" _l disjoint intervals comprising J„ (the set 
we discard at the nth stage) is 2"' l /3 n , and so the total length of [0, 1 ] \ A must 
be 



We have discarded everything!? And left uncountably many points behind!? How 
bizarre! This simultaneous “bigness” and “smallness” is precisely what makes the 
Cantor set so intriguing. The exercises will supply even more ways to say that A is both 
“big” and “small.” 

Our second proof that A is uncountable is based on an equivalent characterization of 
A in terms of ternary (base 3) decimals. Recall that each x in [ 0, 1 ] can be written, in 
possibly more than one way, as: x = Q.a\aia i ■ ■ ■ (base 3), where each a„ = 0, 1, or 2. 
This three-way choice for decimal digits (base 3) corresponds to the three-way splitting 
of intervals that we saw earlier. To see this, let us consider a few specific examples. 
For instance, the three cases a\ =0, 1, or 2 correspond to the three intervals [ 0, 1/3 ], 
(1/3, 2/3), and [ 2/3, 1 ], as in Figure 2.5. 


/i 


ai = 0 = 1 


0 


I 

3 



(Why?) 


Figure 


2.5 


There is some ambiguity at the endpoints: 


1/3 = 0. 1 (base 3) = 0.0222 . . . (base 3). 
2/3 = 0.2 (base 3) = 0. 1222 . . . (base 3). 
1 = 1.0 (base 3) = 0.2222 . . . (base 3), 


but each of these ambiguous cases has at least one representation with a\ in the proper 
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range. Next, Figure 2.6 shows the situation for / 2 (but this time ignoring the discarded 

oi = 0 and ai 

02 = 0 02 = 2 02 = 0 

Figiwe 2 " 7 2 I 2 7 

2.6 u 9 9 3 3 9 

intervals). Again, some confusion is possible at the endpoints: 

1 /9 = 0.01 (base 3) = 0.00222 . . . (base 3), 

8/9 = 0.22 (base 3) = 0.21222 . . . (base 3). 

We will take these few examples as proof of the following 

Theorem 2.15. x e A if and only if x can be written as a„/ 3", where each 

a n is either 0 or 2. 

Thus the Cantor set consists of those points in [ 0, 1 ] having some base 3 decimal 
representation that excludes the digit 1. Knowing this we can list all sorts of elements 
of A. For example, 1/4 e A because 1/4 = 0.020202 . . . (base 3). Theorem 2.15 also 
leads to another proof that A is uncountable; or, rather, it gives us a new way of writing 
the old proof. The first proof used sequences of Os and Is, and now we find ourselves 
with sequences of Os and 2s; the connection isn’t hard to guess. 

Corollary 2.16. A is uncountable; in fact, A is equivalent to [ 0, 1 ]. 

proof. By altering our notation we can easily display a correspondence between 
A and [0, 1 ]. Each x e A may be written x — , Ibn/l", where b„ = 0 or 1, 

and now we define the Cantor function / : A -*■ [ 0, 1 ] by 



That is, 

/(0.aia 2 a 3 • ■ • (base 3)) = 0. y y y • • • (base 2) (a„ = 0, 2). 

Now / is clearly onto, and hence we have a second proof that A is uncountable. 
(Why?) But / isn’t one-to-one; here’s why: 

y (1/3) = / (0.0222 . . . (base 3)) = 0.01 11... (base 2) 

= 0.1 (base 2) = /(0.2 (base 3)) = /( 2/3). 

The same phenomenon occurs at each pair of endpoints of any discarded “middle 
third” interval (i.e., a subinterval of J„): 

/( 1/9) = / (0.00222 . . . (base 3)) = 0.001 11... (base 2) 

= 0.01 (base 2) = /(0.02 (base 3)) = /( 2/9). 


2 and 
o 2 = 2 


(Why?) 
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It is easy to see that / is increasing; that is, if x, y e A with x < y, then 
f(x) < f(y). We leave it as an exercise to check that f(x) = f(y) if and only if 
x and y are endpoints of a discarded “middle third” interval (see Exercise 26). 
Thus, / is one-to-one except at the endpoints of A (a countable set), where it’s 
two-to-one. It follows that A is equivalent to [ 0, 1 ]. (How?) □ 


EXERCISES 

t> 21. Show that any ternary decimal of the form O.U|U 2 • • • a„ 1 1 (base 3), i.e., any 
finite-length decimal ending in two (or more) Is, is not an element of A. 

> 22. Show that A contains no (nonempty) open intervals. In particular, show that 
if x, y € A with x < y, then there is some z € [0, 1 ] \ A with x < z < y. (It 
follows from this that A is nowhere dense, which is another way of saying that A is 
“small”) 

> 23. The endpoints of A are those points in A having a finite-length base 3 decimal 
expansion (not necessarily in the proper form), that is, all of the points in A of the 
form a/ 3" for some integers n and 0 < a < 3". Show that the endpoints of A other 
than 0 and 1 can be written as O.a^ • • • a n+ \ (base 3), where each a* is 0 or 2, except 
a„ + 1 , which is either 1 or 2. That is, the discarded “middle third” intervals are of the 
form (0.a|O2 • • • a„ 1, O.a^ • • • a„ 2), where both entries are points of A written in 
base 3. 

24. Show that A is perfect ; that is, every point in A is the limit of a sequence of 
distinct points from A. In fact, show that every point in A is the limit of a sequence 
of distinct endpoints. 

25. Define g : R -*■ R by g(x) = 1 if x 6 A, and g(x) = 0 otherwise. At which 
points of R is g continuous? 

> 26. Let / : A -*■ [0, 1 ] be the Cantor function (defined above) and let x, y e A 
with x < y. Show that f(x) < f (y). If f(x) = f (y), show that x has two distinct 
binary decimal expansions. Finally, show that f(x) = f(y) if and only if x and y 
are “consecutive” endpoints of the form x = O.ai^ • • • a„ 1 and y = O.a^ • • • a„ 2 
(base 3). 

27. Fix n > 1, and let k = 1 .... , 2" -1 be the component subintervals of the 
nth level Cantor set /„. If x, y € A with |x — y| < 3~", show that x and y are in 
the same component For this same pair of points show that \f(x) — /(y)| < 
2 " n . 


The observation made in Exercise 26 enables us to extend the definition of the Cantor 
function / to all of [ 0, 1 ] in an obvious way: We take / to be an appropriate constant 
on each of the open intervals that make up [0, 1 ] \ A. For example, we would set 
f(x) = /( 1/3) = 1/2 for each x in the interval (1/3, 2/3) and f(x) = /( 1/9) = 1/4 for 
each x in (1/9, 2/9). See Figure 2.7. 
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Formally, we define fix) = sup(/(v) : y e A, y < x) for x e [0, 1 ] \ A. The new 
function / : [ 0, 1 ] -> f 0, 1 ] is still increasing (why?) and is actually continuous! (We 
will prove this in the next section.) Some authors refer to this extension as the Cantor- 
Lebesgue function or Lebesgue ’.v singular function. We will simply call it the Cantor 
function. It is called a singular function because /' = 0 at almost every point in [ 0, 1 ]. 
That is, /' = 0 on [ 0, 1 ] \ A, a set of measure 1 . But we are getting ahead of ourselves. 


EXERCISES 

28. Let / : A -* [ 0, 1 ] be the Cantor function (as originally defined). Check that 
f(x) = sup{/(y) : y € A, y < x) for any x € A. 

> 29. Prove that the extended Cantor function / : [ 0, 1 ] — ► [ 0, 1 ] (as defined 
above) is increasing. [Hint: Consider cases.] 


The construction of the Cantor set admits all sorts of generalizations. For example, 
suppose that we fix a with 0 < a < 1 and we repeat our “middle thirds” construction 
except that at the nth stage each of the open intervals we remove is now taken to have 
length a3~". (And we still want these to be in the “middle” of an interval from the 
current level - it is important that the remaining closed intervals turn out to be nested.) 
Figure 2.8 shows the first few levels of this generalized construction in the case a = 3/5. 
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The limit of this process, called a generalized Cantor set, is very much like the ordinary 
Cantor set. It is uncountable, perfect, nowhere dense, and so on, but this one now has 
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nonzero measure. We leave it as an exercise to check that the generalized Cantor set with 
parameter a has measure /} = l - a. We label these sets according to their measure; 
that is, we write to mean the generalized Cantor set with measure fi. 


EXERCISES 

30. Check that the construction of the generalized Cantor set with parameter a, as 
described above, leads to a set of measure 1 — a; that is, check that the discarded 
intervals now have total length a. 

31. Now that we know the description of A in terms of ternary decimals, it might 
be interesting to consider a similar construction using another base. For example, fix 
an integer p > 3 (to use as the base) and an integer 0 < d < p (as the omitted digit). 
Describe the set of all points in [ 0, 1 ] that have some base p decimal expansion that 
excludes the digit d. Is it uncountable? Does it have measure zero? 


The Cantor set satisfies another rather curious property: The set of all possible 
differences of pairs of elements of A fills up the interval [—1,1 ]; in symbols, A - A = 
[y - x : x, ye A) = [— 1 , 1 J. The original proof, due to Steinhaus, is based on a clever 
geometric observation. The claim is that the equation y - x = b has a solution x, y e A 
for any —\<b< 1. That is, for any -1 < b < 1, the line y = x + b must pass through 
the set A x A. 

Now the set A x A can be constructed inside the square [ 0, 1 ] x [ 0, 1 1 in much 
the same way that A is constructed inside [ 0, 1 ]. We begin with the full square Ao, 
remove “middle thirds” both horizontally and vertically, and arrive at the set of four 
subsquares A\ = ([0, 1/3 ] U [2/3, 1 ]) x ([0, 1/3 ] U [ 2/3, 1 ]). Next “cross out” the 
middle thirds, both horizontally and vertically, from these four squares to arrive at 16 
smaller subsquares, a set that we will call Ai. And continue. The “limit” of this process 
is the set A x A = f'£L | A n . 

To see that a line of the form y = x + b, where — 1 < b < 1 , must pass through 
A x A, it is enough to show that y = x + b always hits any A n , for then we could apply 
a version of the nested interval theorem in R 2 to finish the proof. (We will see just such 
a theorem in Chapter Seven.) For now we will settle for the following “visual” proof. 
Convince yourself that any line of slope 1 that passes through the square [ 0, 1 ] x [ 0, 1 ] 
must also pass through each A„ by considering the following pictures (showing A\ on 
the left, A 2 on the right, and a “worst-case” line drawn through each square). Note that 
by “scaling” it is enough to understand just the first case; see Figure 2.9. 


Monotone Functions 


As we saw in the first chapter, monotone functions are reasonably well behaved. In 
particular, a monotone function has (at worst) only jump discontinuities. It follows that 
a monotone function must have lots of points of continuity. Here’s why: 
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Thus, / is increasing. Next we consider this formula in each of the cases x = x k and 
y = x k . First, 

x=x k < y => f(y) = f(x k ) + e„. 

**<*.<>■ 

Claim. f(x k +) = f(x k ); i.e., 

00 

lim T e n =0 because V] e n -*■ 0 as N -*■ oo. 

XKX,<y n=N 

And, in the second case, 

x<x k =y => f(x k ) = f(x)+ e„ > f(x) + e k . 

x<x,<x t 

Claim. /(**-) = f(x k ) - e k \ i.e., 

lim ^ e„ = e k . 

x ~+ x k x<x n <x k 

Thus, /(**-) + e k = /( x k ) = f(x k +) and f(x k +) - /(**-) = e k . 

The proof that / is continuous at each x € R \ D is similar. 


EXERCISES 

32. Deduce from Theorem 2.17 that a monotone function / : R -> R has points 
of continuity in every open interval. 

33. Let /: [a, b]-+R be monotone. Given n distinct points a < X\ < X 2 < ••• < 
x n < 6,showthat£" =1 | /(*,■+)— /(*,— )| < \f(b)— /(a)|.Usethistogiveanother 
proof that / has at most countably many (j um P) discontinuities. 

34. Let D = {jci, JC 2 , . . .}» and let e n > 0 with £^=1 s n < 00 . Define f(x) = 
J2x n <x Sf * above). Check the following: (i) / is discontinuous at the points of D\ 
(ii)/ is right-continuous everywhere; and (iii) /is continuous at each point x € R \D. 
How might this construction be modified so as to yield a strictly increasing function 
with these same properties? 

35. Let / : [ a, b ] R be increasing, and let (x n ) be an enumeration of the dis- 
continuities of /.For each n, let a n = /(*„) — f(x n —) and b n = f(x n +) — f(x„) 
be the left and right “jumps” in the graph of /, where a n = 0 if x n = a and b„ = 0 
if x„ = b. Show that i a » < f( b ) ~ /(«) and , b„ < f(b ) - f(a). 

36. In the notation of Exercise 35, define h(x) = £ <Jt a„ 4- J2 X . <X b »- Show that 
h is increasing and that g = / — h is both continuous and increasing. Thus, each 
increasing function / can be written as the sum of a continuous increasing function 
g and a “pure jump” function h. 


0 
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Notes and Remarks 

For an infinitely enjoyable discussion of the infinite, see the article “Infinity” by Hans 
Hahn [1956a]. The clever proof of Theorem 2.3, and more, can be found in Newman 
and Parsons [1988], For an alternate proof of Corollary 2.7, see Campbell [1986]. 

Countable (and uncountable) sets were introduced by Cantor. Indeed, most of the 
results in this chapter are due to Cantor himself. In particular. Corollary 2.7, Theo- 
rem 2.9, and Theorem 2.12 are due to Cantor; see Dunham [1990]. The statement of 
Theorem 2.13 originated as an open question in one of Cantor’s seminars. You will 
often see it referred to variously as the Cantor-Bemstein theorem or as the Schroder- 
Bemstein theorem. According to Dudley [ 1 989], full credit should go to Felix Bernstein, 
who was a 19-year-old student at the time! At any rate, Hausdorff [1937] refers to it as 
Bernstein’s theorem. The proof given here is taken from an exercise in Willard [1970] 
but is probably much older. 

The proof of Theorem 2.12 may remind you of Russell’s paradox. Briefly, Russell’s 
paradox demonstrates that there are limitations on what may be regarded as a set. As 
Russell would ask, is the collection U of all sets again a set? If so, then we might consider 
the set B = [A e U : A $ /t}. Now if we accept U as a set, then the “rules” of set 
operations say that we are stuck with accepting B as a set, too. With that decision made, 
the “rules” likewise permit us to ask the question, is B e B ? A moment’s reflection on 
what this means will have your head spinning! Evidently not everyone gets to be a set. 
We have taken the easy way out and left the concept of “set” as a primitive, undefined 
notion. Not to worry, though; we are on solid ground. Trust me! 

Although Example 2. 1 1 (b) might suggest that it is impossible to construct a sin- 
gle transcendental number, that is not entirely true. Since the algebraic numbers are 
countable, the “diagonalization” technique used in the proof of Theorem 2.9, if care- 
fully applied, would yield a specific transcendental number. Better still, it is actually 
possible to display uncountably many transcendental numbers: In 1844, Liouville first 
proved that transcendental numbers exist by showing that any number of the form 
where the a„ are integers with 1 < a„ < 9, is transcendental. However, 
not all transcendental numbers are of this form. Following this discovery, Hermite 
showed in 1873 that e is transcendental, and Lindemann showed in 1882 that n is tran- 
scendental. For more details, see Oxtoby [1971], Stromberg [1981], and Kline [1972]. 
For more on what mathematicians mean by the word “impossible,” see Davis [1986]. 

In addition to the books by Dudley [1989] and Hausdorff [1937], you can find 
more abstract set theory in the books by Boas [1960], Folland [1984], Hewitt and 
Stromberg [1965], Halmos [1960], Kaplansky [1977], Kolmogorov and Fomin [1970], 
and Torchinsky [1988]. Several of the references include a bit of history, too. The 
books by Willard and Dudley, for example, have copious notes and references to original 
works. Kline [1972] is a mammoth source of information about mathematics in general. 
Hawkins [ 1 970] has a detailed exposition of the events leading to the great “revolution” 
in analysis following the work of Riemann, Weierstrass, and Cantor (roughly speaking, 
the years 1875-1925). Manheim [1964] traces the early development of abstract set 
theory and point set topology. 
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Cantor first mentioned “the” Cantor set in connection with the concept of “perfect” 
sets in Cantor [1883], but the set itself was not discovered by Cantor. Examples of 
this type, including “the” Cantor set, had already been introduced by H. J. S. Smith in 
connection with two constructions for nowhere dense sets in H. J. S. Smith [1875]. Ac- 
cording to Hawkins [ 1970], Smith’s results did not become well known until the 1 880s. 
The title of Smith’s paper, “On the integration of discontinuous functions,” highlights 
the connection between abstract set theory and integration. Interest in infinite sets and 
“pathological” sets was bom out of the study of Riemann’s integrability condition and 
its relation to Fourier’s work on trigonometric series. (See Hawkins [1970], Manheim 
[1964], Rogosinski [1950], and the series of articles by Dauben [1971, 1974, 1983].) 
We will have more to say about this in Part Three; in any event, the Cantor set will 
remain an important example throughout this course. 

The “visual” proof that A - A = [- 1, 1 ] is originally due to Steinhaus [1917]. For 
a proof based on the ternary decimal representation of A, see Randolph [1940]. For 
more on the Cantor set and generalized Cantor sets, see Chae [1980], Randolph [1968], 
Coppel [1983], and Majumder [1965]. For more on the Cantor function, see Chalice 
[1991] and Hille and Tamarkin [1929], 

The construction used in the converse to Theorem 2. 1 7 is based on the presentation in 
Rudin [1953]. Our results about monotone functions will turn out to be very useful later 
in the course when we discuss “the problem of moments.” This famous problem has 
its roots in mathematical physics, but it is of consequence to probability and statistics 
as well. We will postpone further discussion of the problem; for more details and a 
few clues about what is ahead, see the short note “Stieltjes on the Stieltjes integral” in 
Birkhoff [1973]. 



CHAPTER THREE 


Metrics and Norms 


In the beginning there were operations - hundreds of them - limits, derivatives, integrals, 
sums; all of the many operations on functions, sequences, sets, vectors, matrices, and 
whatever else you might have encountered in calculus. The hallmark of twentieth- 
century mathematics is that we now view these operations as functions defined on 
entire collections of “abstract” objects rather than as specific actions taken on individual 
objects, one at a time. Maurice Fr6chet, in a short expository article from 1950, had 
this to say (the italics are his own): 

In modem times it has been recognized that it is possible to elaborate full mathematical 
theories dealing with elements of which the nature is not specified, that is, with abstract 
elements. A collection of these abstract elements will be called an abstract set. If to this 
set there is added some rule of association of these elements, or some relation between 
them, the set will be called an abstract space. A natural generalization of function consists 
in associating with any element * of an abstract set £ a number / (jc). Functional analysis 
is the study of such “functionals” f(x). More generally, general analysis is the theory of 
the transformations y = £[*] of an element jc of an abstract set E into an element y of 
another (or the same) abstract set F. It is obvious that the study of general analysis should 
be preceded by a discussion of abstract spaces. 

It is necessary to keep in mind that these notions are not of a metaphysical nature ; 
that when we speak of an abstract element we mean that the nature of this element is 
indifferent, but we do not mean at all that this element is unreal. Our theory will apply 
to all elements; in particular, applications of it may be made to the natural sciences. Of 
course, due attention must be paid to any properties which depend essentially on the nature 
of any special category of elements under investigation. 

Early examples of this type of abstraction appeared in 1906 in Fr6chet’s thesis, “Sur 
quelques points du calcul functionnel,” in which he introduced a notion of distance de- 
fined on abstract sets of points. In particular, FrSchet considered the collection C[ 0, 1 ], 
consisting of all continuous real-valued functions defined on the closed interval [ 0, 1 ], 
where we measure the distance between two functions by taking the maximum vertical 
distance between their graphs; that is, dist(/,g) = maxo<,<i |/(r)-g(/)|. (This distance 
function was actually well known in 1906, but Frdchet was the first to view it as a small 
part of a much bigger picture.) Given a notion of distance between elements of C[ 0, 1 ], 
it makes sense to ask questions like: Is integration continuous? That is, are the numbers 
fo f(t)dt and / 0 ! g(t)dt “close” whenever / and g are “close”? 

This new point of view proved to have immediate applications; in that same year 
Friedrich Riesz used Frdchet’s ideas to give a new proof of a result of Erhardt Schmidt, 
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stating that any orthonormal system in C[0, 1 ] must be countable. In fact, Riesz ex- 
tended this result to another collection of functions and in so doing introduced the L p 
spaces. Riesz’s techniques revolutionized the study of trigonometric series. To say that 
Fr6chet’s ideas caught on would be an understatement; the study of modem analysis 
would be lost without them. By 1928, Fr6chet had compiled a monograph on his re- 
search on abstract spaces entitled Les Espaces Abstraits. (The word “space” has come 
to connote an abstract set of points that carries with it some additional structure.) Much 
of the terminology we will use, and certainly most of our examples of abstract spaces, 
can be found in Fr&het’s monograph. By mathematical standards, 1928 is not so very 
long ago. 


Metric Spaces 

Given a set M, how might we define a distance function on Ml What would we want a 
“reasonable” distance to do? Certainly we would want our distance to be (defined and) 
nonnegative for any pair of points in M . Let’s start there: Let d : M x M -*■ [ 0, oo) be 
a nonnegative, real-valued function defined on all pairs of elements from M . We would 
probably expect to have d(x, x) = 0 for any x e M. And d(x. y) = 0 should mean that 
x = y. We would most likely want our distance to also satisfy d(x, y) = d(y, x) for 
all pairs of points x, y e M. Anything else? Well, in the hope of preserving at least 
a bit of the geometry granted by the familiar distances in R and R", we might also 
require one last property. The distance function should satisfy the triangle inequality: 
For each triple of points x, y, z in M, we ask that d(x, y ) < d(x, z) + d(z, y). The 
triangle inequality is the embodiment of that old saw, “The shortest distance between 
two points is a straight line.” This timid little inequality will turn out to be immensely 
valuable. 

A function d on M x M satisfying the following properties is called a metric on M. 

(i) 0 < d(x, y) < oo for all pairs x,y e M. 

(ii) d(x, y) = 0 if and only if x = y. 

(iii) d(x, y) = d(y, x) for all pairs x, y € M. 

(iv) d( x, y) < d(x, z) + d(z, y) for all x, y, z 6 M. 

A function d on M x M that satisfies all of the above save item (ii) is sometimes called 
a pseudometric. Thus, a pseudometric will permit distinct points to be 0 distance apart. 

The couple ( M , d ), consisting of a set M together with a metric d defined on M, is 
called a metric space. If a particular metric on M is understood, or if the argument at 
hand works equally well for any metric, we may forego this formality and simply refer 
to the set M as a metric space, with the tacit understanding that a metric d is available 
on demand. 

Examples 3.1 

(a) Every set M admits at least one metric. For example, check that the function 
defined by d(x, y) = 1 for any x ^ y in M, and d(x, x) = 0 for all x in M, is a 
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metric. This mundane, but always available, metric is called the discrete metric 
on M. It will prove to be much more interesting than first appearances suggest. 
A set supplied with its discrete metric will be called a discrete space. 

(b) An important example for our purposes is the real line R together with its usual 
metric d(a, b) = \a - b\. Any time we refer to R without explicitly naming a 
metric, the absolute value metric is always understood to be the one that we 
have in mind. 

(c) Any subset of a metric space is again a metric space in a very natural way. If d 
is a metric on M, and if A is a subset of M, then d(x, y) is defined for any pair of 
points x, y € A. Moreover, the restriction of d to A x A obviously still satisfies 
properties (i) — (iv). That is, the metric that is defined on M automatically defines 
a metric on A by restriction. We will even use the same letter d and simply 
refer to the metric space (A,d ). Of particular interest in this regard is that N, 
Z, Q, and R \ Q each come already supplied with a natural metric, namely, the 
restriction of the usual metric on R. In each case, we will refer to this restriction 
as the usual metric. 


EXERCISES 


1. Show that 


d(x, y ) = 


X 


\_ 

y 


defines a metric on (0, oo). 

> 2. If disametricon M, show that |d(*, z)— d(y, z)| < d(x, y)foranyx, y,z € M. 

3. As it happens, some of our requirements for a metric are redundant. To see why 
this is so, let M be a set and suppose that d : M x M -*■ R satisfies d(x, y) — 0 if 
and only if x = y, and d(x, y) < d(x, z) + d{y, z) for all x, y, z € M. Prove that 
d is a metric; that is, show that d{x, y) > 0 and d(x, y) = d(y, x) hold for all x, y. 

4. Let M be a set and suppose that d : M x M -*■ [ 0, oo) satisfies properties 
(i), (ii), and (iii) for a metric on M and the triangle inequality reversed : d(x, y) > 
d( x, z) + d(z, y). Prove that M has at most one point. 

> 5. There are other, albeit less natural, choices for a metric on R. For instance, 
check that p(a, b ) = VI - *1. b) = |o — b\/{\ + | a — b|), and r (a, b ) = 
min{|a — b\, 1} each define metrics on R. [Hint: To show that a is a metric, you 
might first show that the function F(t) = t/{\ + t) is increasing and satisfies 
F(s + 05 F(s) + F(t) for s, t > 0. A similar approach will also work for p 
and t.] 

> 6. \fd is any metric on M, show that p(x, y) = ^d(x, y),o(x, y) = d(x, y)/(l + 
d(x, y)), and r(x, y) = min{d(jc, y), 1} are also metrics on M. [Hint: o(x, y) = 
F(d(x , y)), where F is as in Exercise 5.] 

7. Here is a generalization of Exercises 5 and 6. Let / : [ 0, oo) [ 0, oo) be 
increasing and satisfy /( 0) = 0, and f(x) > 0 for all x > 0. If / also satisfies 
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f(x + y) < f(x) + /(y) for all x , y > 0, then / o d is a metric whenever rf 
is a metric. Show that each of the following conditions is sufficient to ensure that 
f(x + y) < f(x) + /(y) for all *, y > 0: 

(a) / has a second derivative satisfying /" < 0; 

(b) / has a decreasing first derivative; 

(c) /( x)/x is decreasing for x > 0. 

[Hint: First show that (a) => (b) => (c).l 

8. If d\ and d 2 are both metrics on the same set M , which of the following yield 
metrics on M: d\+d 2 *! max{di, ^ 2 }? min{^i,t/ 2 }? If d is a metric, is d 2 a metric? 

9. Recall that 2 N denotes the set of all sequences (or “strings'’) of Os and Is. Show 
that d(a , b) = Y1T=\ 2“ n | a n — b n \ , where a = ( a n ) and b = ( b n ) are sequences of 
Os and Is, defines a metric on 2 N . 

10. The Hilbert cube H 00 is the collection of all real sequences x = (jc„) with 

tail < 1 for n = 1 , 2 

(i) Show that </(a, y) = |jc n — y„| defines a metric on H oc . 

(ii) Given jc, y 6 //°° and k e N, let = max{|A| — y 1 1 • - - - * k* “ y*D- Show 
that 2~ k Mi l < d(x y y) < M k + 2~ k . 

11. Let R 00 denote the collection of all real sequences x = (*„). Show that the 
expression 


d(x % y) = 


1 Uw ~ Vnl 

1 + 1*1. - > ? nl 


defines a metric on R 00 . 

12. Check that d(f y g) = max a < t < b \f(t) - g(f)| defines a metric on C[a, b] y 
the collection of all continuous, real-valued functions defined on the closed interval 
[a,b]. 

13. Fr^chet’s metric on C[ 0, 1 ] is by no means the only choice (although we will 
see later that it is a good one). For example, show that p(/, g) = f* \ /(/) — g(/)| dt 
and a(/, g) = f Q min{|/(/) — g(f)|, 1} dt also define metrics on C[0, 1 ]. 

> 14. We say that a subset A of a metric space M is bounded if there is some xo € M 
and some constant C < 00 such that d(a , x 0 ) < C for all a € A . Show that a finite 
union of bounded sets is again bounded. 

> 15. We define the diameter of a nonempty subset A of M by diam(A) = 
sup{J(a, b ) : a,b € A). Show that A is bounded if and only if diam(A) is finite. 


Normed Vector Spaces 

A large and important class of metric spaces are also vector spaces (over R or C ). 
Notice, for example, that C[0, I ] is a vector space (and even a ring). An easy way to 
build a metric on a vector space is by way of a length function or norm. A norm on a 
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vector space V is a function || • || : V -*■ [ 0, oo) satisfying: 

(i) 0 < |k || <oo for all x e V; 

(ii) ||jr || = 0 if and only if x = 0 (the zero vector in V); 

(iii) || ax || = |a| |k It for any scalar a and any x e V; and 

(iv) the triangle inequality: Ik + yll < Ik II + ||y|| for all x, y e V. 

A function II • II : V -*■ [0, oo) satisfying all of the above properties except (ii) is called 
a pseudonorm on V ; that is, a pseudonorm permits nonzero vectors to have 0 length. 

The pair (V, || • || ), consisting of a vector space V together with a norm on V, is 
called a normed vector space (or normed linear space). Just as with metric spaces, we 
may be a bit lax with this formality. Phrases such as “let V be a normed vector space” 
carry the tacit understanding that a norm is lurking about in the background. 

It is easy to see that any norm induces a metric on V by setting d(x, y) = ||x - y||. 
We will refer to this particular metric as the usual metric on (V, || • || ). We may even be 
so bold as to refer to (V, || • || ) as a metric space with the clear understanding that the 
usual metric induced by the norm is the one that we have in mind. Not all metrics on 
a vector space come from norms, however, so we cannot afford to be totally negligent 
(see Exercise 16). 

Examples 3.2 

(a) The absolute value function | • | clearly defines a norm on R. 

(b) Each of the following defines a norm on R": 

lkll. = X>,l. lkll2 = (X>/l 2 

i=l \i= 1 

and Iklloc = maxi<,<„ |x<|. where x = (xi x„) e R". The first and last 

expressions are very easy to check while the second takes a bit more work. 
(Although this is probably familiar from calculus, we will supply a proof shortly.) 
The function || • lb is often called the Euclidean norm and is generally accepted 
as the norm of choice on R". As it happens, for any 1 < p < oo, the expression 
||x Ik = ( £ ki \ p ) l,P defines a norm on R"; see Theorem 3.8. 

(c) Each of the following defines a norm on C[a, b ]: 

\\f\U= f\f(t)\dt, ||/|b = (j[V(')l 2 <ft 

and H/lloo = max |/(0I • 

a<t<b 

Again, the second expression is hardest to check (and we will do so later; for 
now, see Exercise 25). The last expression is generally taken as “the” norm on 

C[a,b\. 

(d) If (V, || - 1| > is a normed vector space, and if W is a linear subspace of V, then 
W is also normed by || • ||. That is, the restriction of || • || to W defines a norm 
on W. 

(e) We might also consider the sequence space analogues of the “scale” of norms 
on R" given in (b). For 1 < p < oo, we define t p to be the collection of all 
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real sequences * = (*„) for which ££1, |jc„ | p < oo, and we define l 0 0 to be 
the collection of all bounded real sequences. Each i p is a vector space under 
“coordinatewise” addition and scalar multiplication. Moreover, the expression 
Ml p = (E l*r.l p ) 1/ '’ if 1 < p < 00 or Moo = sup„ |*„| if p = oo defines a 
norm on C p . The cases p = 1 and p = oo are easy to check (see Exercise 21), 
the case p = 2 is given as Theorem 3.4, while the case 1 < p < oo is given as 
Theorem 3.8. 

We can complete the details of several of our examples if we prove that 1 2 is a vector 
space and that || • H 2 is a norm on i 2 . Now it is easy to see that if ||jc H 2 = 0, then x„ = 0 
for all n and hence that x = 0 (the zero vector in l{). Also, given x € I 2 and or € R, 
it is easy to see that ax e I 2 , where ax = (ax n ), and that ||arjc H 2 = |a|Mb- What is 
not so clear is whether x + y = (x„ + y„) is in I 2 whenever x and y are in I 2 . In other 
words, if x and y are square-summable, does it follow that x + y is square-summable? 
A moment’s reflection will convince you that to answer this question we will need 
to know something about the “dot product” £ x„y„ . This extra bit of information is 
supplied by the following lemma. 

Lemma 33. (The Cauchy-Schwarz Inequality) XX 1 l*iyil < M 2 M 2 far 
onyx, y e t 2 . 


proof. To simplify our notation a bit, let’s agree to write (x, y) = We 

first consider the case where x, y e R" (that is, *, = 0 = y, for all i > n). In this 
case, (x, y) is the usual “dot product” in R". Also notice that we may suppose 
that x, y ^ 0. (There is nothing to show if either is 0.) 

Now let t € R and consider 


0 < II* + rylli = {x + ty,x + ty) = ||x||^ + 2t(x, y) + / 2 ||y|||. 

Since this (nontrivial) quadratic in t is always nonnegative, it must have a nonpos- 
itive discriminant. (Why?) Thus, (2(x, y>) 2 — 4 ||jc |l| ||y ||| < 0 or, after simplifying, 

l<*, ^>1 < M 2 M 2 . That is, |E7 = i *<y;| < II* II 2 Iblb- 

Now this isn’t quite what we wanted, but it actually implies the stronger in- 
equality in the statement of the lemma. Why? Because the inequality that we have 
shown must also hold for the vectors ( |x, | ) and ( |y, | ). That is, 

n 

X>,lly,l < Il(l*/I)ll 2 ll(ly,l)ll 2 = IWh llylb- 

1=1 

Finally, let x, y 6 £ 2 - Then for each n we have 


|>»i ± (i>i j )' ,! (i>i ! )' 


Mb Ilyll 2 - 


Thus, XX I *.y. must he absolutely convergent and satisfy ESi I*i 3 'il 5 
11*112 llylb- □ 


Now we are ready to prove the triangle inequality for the ^ 2 -norm. 
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Theorem 3.4. (Minkowski’s Inequality) //jc, y € t 2 , then x + y e i 2 . Moreover, 

II* + ylh < \\x\\2 + lly H z- 

proof. It follows from the Cauchy-Schwarz inequality that for each n we have 

^2 1**' + yi i 2 = it, \ Xi i 2 + 2 x *y‘ + Y1 to i 2 

i = l /=1 i=l i = l 

<«xBl + 2|xH 2 IW2 + llyli = ( IWh + Itolh) 2 • 

Thus, since n is arbitrary, we have x + y e £2 and II* + y|b < ||jc II 2 + llylb- □ 

We have now shown that £2 is a vector space and that || • lb is a norm on £ 2 - As you 
have no doubt already surmised, the proof is essentially identical to the one used to 
show that || • || 2 is a norm on R". In the next section a variation on this theme will be 
used to prove that l p is a vector space and that || • || p is a norm. 


EXERCISES 

16. Let V be a vector space, and let d be a metric on V satisfying d{x, y) = 
d(x — y, 0) and d(ax, ay) = |a| d(x. y) for every x, y € V and every scalar a. 
Show that II* || = d(x, 0) defines a norm on V (that has d as its “usual” metric). Give 
an example of a metric on the vector space R that fails to be associated with a norm 
in this way. 

17. Recall that for * € R" we have defined ||*|h = I*, I and ||*||oc,= 

maxi<,<„ |x,|. Check that each of these is indeed a norm on R". 

> 18. Show that ||x||oo < IMI2 5 ||*|li f° r any * € R". Also check that ||*||| < 
«ll*lloo and 11*11, < v/n II* h- 

19. Show that we have ;c,y, = ||x ||2 llylh (equality in the Cauchy-Schwarz 
inequality) if and only if jc and y are proportional , that is, if and only if either x = ay 
or y = ajc for some a > 0. 

20. Show that ||j4|| = maX|<,< n ( 5ZJ=i l tf i,>l 2 ) l/2 a norm on the vector space 
R nxm of all n x m real matrices A = [a,., ]. 

21. Recall that we defined £\ to be the collection of all absolutely summable se- 
quences under the norm ||jc||i = l*nl» and we defined to be the collection 
of all bounded sequences under the norm ||jc||oo = sup„> , \x n \. Fill in the details 
showing that each of these spaces is in fact a normed vector space. 

22. Show that || jc || oo < ||jc H 2 for any jc € t 2 , and that ||jc|| 2 < ||jc||j for any jc € l \. 

23. The subset of consisting of all sequences that converge to 0 is denoted 
by Co. (Note that Co is actually a linear subspace of thus Co is also a normed 
vector space under || • ||oo ) Show that we have the following proper set inclusions: 
1 1 C i 2 C c 0 C too- 
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More Inequalities 

We next supply the promised extension of Theorem 3.4 to the spaces t p , 1 < p < oo. 
Just as in the case of l z, notice that several facts are easy to check. For example, it is clear 
that || j: ||p = 0 implies that x = 0, and it is easy to see that ||ax||p = |a| ||;c||p for any scalar 
a. Thus we lack only the triangle inequality. We begin with a few classical inequalities 
that are of interest in their own right. The first shows that i p is at least a vector space: 

Lemma 3.5. Let 1 < p < oo and let a, b > 0. Then, ( a 4- b) p < 2 p (a p + b p ). 
Consequently, x + y € l p whenever x, y 6 t p . 

proof, (a + b) p < (2max{a, b)) p = 2 P max{a p , b p ) < 2 p (a p + b p ). Thus, if 
y € i„, then Un + y n \ p < 2 p + 2" £ n °i, 1*1' < «>■ □ 

Lemma 3.6. (Young’s Inequality) Let 1 < p < oo and let q be defined by 
l/p + \/q = 1. Then, for any a, b > 0, we have ab < a p /p + b q /q, with equality 
occurring if and only if a p ~ { = b. 

proof. Since the inequality trivially holds if either a or b is 0, we may certainly 
suppose that a, b > 0. Next notice that q = p/(p - 1) also satisfies 1 < q < oo 
and p - 1 = p/q = 1 /(q — 1). Thus, the functions /(/) = t p ~ 1 and g(r) = t q ~ l 
are inverses for t > 0. 

The proof of the inequality follows from a comparison of areas (see Figure 3. 1 ). 
The area of the rectangle with sides of lengths a and b is at most the sum of the 
areas under the graphs of the functions y = x p ~ l for 0 < x < a and x = y q ~ l for 



0 < y < b. That is, 

r a r h a p b q 

ab < I x p dx + I y q dy = — H . 

Jo Jo P <7 

Clearly, equality can occur only if a p ~ l = b. □ 

When p = q = 2, Young’s inequality reduces to the arithmetic-geometric mean in- 
equality (although it is usually stated in the form -Jab < ( a+b)/2 ). Young’s inequality 
will supply the extension of the Cauchy-Schwarz inequality that we need. 
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Lemma 3.7. (Holder's Inequality) Let 1 < p < oo and let q be defined by 
l/p + l/q = 1. Given x € l p and y € t q , we have l*.>’/l < ll*llp ||y||,. 


proof. We may suppose that ||jt||p > 0 and ||y||, > 0 (since, otherwise, there is 
nothing to show). Now, for n > 1 we use Young’s inequality to estimate: 


j-i 


Xi)?i 

< if 

Xi 

p 1 " 

yi 

Mpllyll, 

- ph 

II* lip 

q " 

* i=i 

llyll. 


’ll 
< - + - = 1. 
P <■. 1 


Thus, I*/?.! < ll*ll P ||y||, for any n > 1, and the result follows. □ 


Our proof of the triangle inequality will be made easier if we first isolate one of the 
key calculations. Notice that if x € i p , then the sequence ( |jt n | / ’~ l )^ 1 € l q , because 
(p - l)q = p. Moreover, 


ii(i*„r')ii. 



wr- 


Theorem 3.8. (Minkowski’s Inequality) Let 1 < p < oo. If x, y e l p , then x + 
ye( p and \\x + y\\ p < ||Ar|| p + ||y|| p . 


proof. We have already shown that x + y e l p (Lemma 3.5). To prove the 
triangle inequality, we once again let q be defined by \/p+ \/q = 1 , and we now 
use Holder’s inequality to estimate: 

OO OO 

Y I*; + y<\ p = to + yi\ ■ to + y,r 1 
1 = 1 1 = 1 

OO oo 

< £ \X,\ • \xi + y, r ' + Y. 1*1 • to + *l' - ' 

1 = 1 1 = 1 

< ll*llp • IK \x n + y»l p_, )IU + (yip • IK to + y„r 1 ')«, 

= II* + >’llp -1 ( ll*llp + llyllp) ■ 

That is, || x + y||£ < ||jc + y||£~' ( ||x|| p + ||y||p), and the triangle inequality fol- 
lows. □ 


EXERCISES 

24. The conclusion of Lemma 3.7 also holds in the case p = 1 and q — oo. Why? 

25. The same techniques can be used to show that ||/||p = (/J \f(t)\ p dt) l,p 
defines a norm on C[ 0, 1 ] for any 1 < p < oo. State and prove the analogues of 
Lemma 3.7 and Theorem 3.8 in this case. (Does Lemma 3.7 still hold in this setting 
for p = 1 and q = oo?) 
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26. Given a, b > 0, show that lim p _ 00 (a p + b p ) l/p = max (a, b). [Hint: If a < b 
and r = a/b, show that (l/p)log(l 4- r p ) -*■ 0 as p — *■ oo.] What happens as 
p -*■ 0? as p -*■ — 1 ? as p — ► — oo? 


Limits in Metric Spaces 

Now that we have generalized the notion of distance, we can easily define the notions 
of convergence and continuity in metric spaces. It will help a bit, though, if we first 
generate some notation for “small” sets. Throughout this section, unless otherwise 
specified, we will assume that we are always dealing with a generic metric space 
(M,d). 

Given x € M and r > 0, the set B r (x ) = {y e M : d(x, y ) < r} is called the 
open ball about x of radius r. If we also need to refer to the metric d, then we write 
Bf(x). We may occasionally refer to the set [y e M : d(x, y) < r) as the closed 
ball about x of radius r, but we will not bother with any special notation for closed 
balls. 

Examples 3.9 

(a) In R we have B r (x) = (jc — r, x + r), the open interval of radius r about x, while 
in R 2 the set B r (x) is the open disk of radius r centered at jc. 

(b) In a discrete space Bj(jc) = {jc} and B 2 (x) = M. 

(c) In a normed vector space (V, || - 1|) the balls centered at 0 play a special role (see 
Exercise 32); in this setting B r ( 0) = {jc : ||jc|| < r}. 

A subset A of M is said to be bounded if it is contained in some ball, that is, if A c 
B r ( jc) for some x e M and some r > 0. But exactly which jc and r does not much matter. 
In fact, A is bounded if and only if for any x e M we have sup aeA d( jc, a) < oo. (Why?) 
Related to this is the diameter of A, defined by diam(A) = sup{d(a, b) : a,b e A}. The 
diameter of A is a convenient measure of size because it does not refer to points outside 
of A. 


EXERCISES 

Each of the following exercises is set in a generic metric space (M, d). 

27. Show that diam(B r (jc)) < 2 r, and give an example where strict inequality 
occurs. 

28. If diam(A) < r, show that A C B r (a) for some a € A. 

> 29. Prove that A is bounded if and only if diam(A) < oo. 

>30. If A C B, show that diam(A) < diam(B). 

31. Give an example where diam(A U B) > diam(A) + diam(fi). If AD B ^ 0, 
show that diam(A UB)< diam(A) + diam(B). 
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> 32. In a normed vector space ( V , || • ||) show that B r (x) = x + B r ( 0) = [x + y : 
llyll < r} and that B r (0) = rB,(0) = {rx : ||x|| < 1}. 


A neighborhood of x is any set containing an open ball about x. You should think of 
a neighborhood of x as a “thick” set of points near x. We will use this new terminology 
to streamline our definition of convergence. 

We say that a sequence of points (jc„) in M converges to a point x e M if 
d{x„,x) -*■ 0. Now, since this definition is stated in terms of the sequence of real 
numbers (d(x „ , we can easily derive the following equivalent reformulations: 

1 (x„) converges to x if and only if, given any e > 0, there is 
an integer N > 1 such that d(x„,x) < e whenever n > N, 

or 

( U„) converges to * if and only if, given any e > 0, there is 
an integer N > 1 such that {x n : n > N) c B r (x). 

If it should happen that {*„ : n > N) c A for some N, we say that the sequence (at„) is 
eventually in A. Thus, our last formulation can be written 

( (*„) converges to x if and only if, given any s > 0, 
the sequence (at„) is eventually in B,(x) 

or, in yet another incarnation, 

{ (*„) converges to x if and only if the sequence 
(*„) is eventually in every neighborhood of x. 

This final version is blessed by a total lack of Ns and es! In any event, just as with 
real sequences, we usually settle for the shorthand x„ -*■ x in place of the phrase (x n ) 
converges to x. On occasion we will want to display the set A/, or d, or both, and so 
we may also write x„ -*■ x or x„ -*■ x in (A/, d ). We also define Cauchy (or d-Cauchy, 
if we need to specify d ) in the obvious way: A sequence (*„) is Cauchy in (M, d ) if, 
given any e > 0, there is an integer N > 1 such that d(x m ,x„) < e whenever m,n > N. 
We can reword this just a bit to read: (x „ ) is Cauchy if and only if, given e > 0, there is 
an integer N > 1 such that diam({x„ : n > N)) < e. (How?) 

Much of what we already know about sequences of real numbers will carry over 
to this new setting - but not everything! The reader is strongly encouraged to test the 
limits of this transition by supplying proofs for the following easy results. 


EXERCISES 

Each of the following exercises is set in a metric space M with metric d. 

33. Limits are unique. [Hint: d( x, y) < d(x, x„) + d(x„, y).J 

>34. If jc„ — ► jc in (A/, d ), show that d(x„, y) —*■ d(x, y) for any y € M. More 
generally, if — >• jc and y„ -* y, show that d(x„, y n ) — ► d(x, y). 
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35. If x„ — ► x, then x ni ->■ x for any subsequence (jt n , ) of (x n ). 

> 36. A convergent sequence is Cauchy, and a Cauchy sequence is bounded (that is, 
the set [x n : n > 1 } is bounded). 

t> 37. A Cauchy sequence with a convergent subsequence converges. 

38. A sequence ( x „ ) has a Cauchy subsequence if and only if it has a subsequence 
(x „ k ) for which d (x nt , x „ k ^ ) < 2~ k for all k. 

t> 39. If every subsequence of (x „ ) has a further subsequence that converges to Jt, then 
(jc„) converges to x. 


Now, while several familiar results about sequences in R have carried over success- 
fully to the “abstract” setting of metric spaces, at least a few will not survive the journey. 
Two especially fragile cases are: Cauchy sequences need not converge and bounded 
sequences need not have convergent subsequences. A few specific examples might help 
your appreciation of their delicacy. 

Examples 3.10 

(a) Consider the sequence (l/n)£i, living in the space M = (0, 1 ] under its usual 
metric. Then, ( 1 / n) is Cauchy but, annoyingly, does not converge to any point 
in M. (Why?) Notice too that (1/n) is a bounded sequence with no convergent 
subsequence. 

(b) Consider M — R supplied with the discrete metric. Then, (n)£i, is a bounded 
sequence with no Cauchy subsequence! 

(c) At least one good thing happens in any discrete space: Cauchy sequences always 
converge. But for a simple reason. In a discrete space, a sequence ( x „ ) is Cauchy 
if and only if it is eventually constant ; that is, if and only if x„ = x for some 
(fixed) x and all n sufficiently large. (Why?) 

Let’s take a closer look at R" (with its usual metric). Since d(x, y) = ||;t — y|h = 

(5Z"=i I*/— y;| 2 ) 1/2 > |*, -y, | for any j — 1 n, it follows that a sequence of vectors 

jt (t) = (** x k n ) in R" converges (is Cauchy) if and only if each of the coordinate 

sequences (x*)*^ converges (is Cauchy) in R. (Why?) Thus, nearly every fact about 
convergent sequences in R “lifts” successfully to R" . For example, any Cauchy sequence 
in R" converges in R", and any bounded sequence in R" has a convergent subsequence. 

How much of this has to do with the particular metric that we chose for R"? And 
will this same result “lift” to the spaces i\, £ 2 . or l^, for example? We cannot hope 
for much, but each of these spaces shares at least one thing in common with R n . Since 
all three of the norms || • |h, || • lb. and || • Hoc satisfy ||x|| > \xj\ for any j, it follows 
that convergence in l\, li, or will imply “coordinatewise” convergence. That is, if 
x (k) = (** )~ ,, k = 1, 2, . . . , is a sequence (of sequences!) in, say, and if x {k) -+ x 

in £|, then we must have x k -* x„ (as k -*■ 00 ) for each n = 1,2, A simple 

example will convince you that the converse does not hold, in general, in this new 
setting. The sequence e (k) — (0, . . . , 0, 1, 0, . . .), where the kth entry is 1 and the rest 
are 0s, converges “coordinatewise” to 0 = (0, 0, . . .), but ( e (k) ) does not converge to 0 
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in any of the metric spaces t\,ti. or l^. Why? Because in each of the three spaces we 
have d(e {k \ 0) = \\e ik) \\ = 1 . In fact, (e {k) ) is not even Cauchy because in each case we 
also have \\e (k) - e (m) \\ > 1 for any k^m. 


EXERCISES 

40. Here is a positive result about t\ that may restore your faith in intuition. Given 
any (fixed) element jc € t \ , show that the sequence x (k) = (X| , . . . , jc*, 0, . . .) € 1 1 
(i.e., the first k terms of x followed by all Os) converges to x in £ r norm. Show 
that the same holds true in I 2 , but give an example showing that it fails (in general) 
in 

41. Given x, y € £ 2 . recall that { x , y) = x ,y,. Show that if x ik) -> x and 

y (k) -* y in £ 3 . then {x ik \ y {k) ) ( x , y). 

> 42. Two metrics d and p on a set Af are said to be equivalent if they generate the 
same convergent sequences; that is, d(x ny jc) — ► 0 if and only if p(jt„, x) — ► 0. If d 
is any metric on Af , show that the metrics p, cr, and r, defined in Exercise 6, are all 
equivalent to d. 

> 43. Show that the usual metric on N is equivalent to the discrete metric. Show that 
any metric on a finite set is equivalent to the discrete metric. 

> 44. Show that the metrics induced by || ■ || 1 , || ■ || 2 , and || • || 00 on R" are all equivalent. 
[Hint: See Exercise 18.] 

45. We say that two norms on the same vector space X are equivalent if the metrics 
they induce are equivalent. Show that || • || and ||M|| are equivalent on X if and only 
if they generate the same sequences tending to 0; that is, ||jc n || — ► 0 if and only if 

IIUJI^O. 

t> 46. Given two metric spaces (Af , d ) and ( N , p ), we can define a metric on the 
product Af x N in a variety of ways. Our only requirement is that a sequence of 
pairs (a„, x n ) in Af x N should converge precisely when both coordinate sequences 
(a n ) and (jc„) converge (in (Af , d ) and (/V, p ), respectively). Show that each of the 
following define metrics on Af x N that enjoy this property and that all three are 
equivalent: 

d\ (( a , x), ( b , y)) = d(a , b) + p(x, y), 
d 2 ((a, x), (fc, y)) = (d(a, bf + p(x, y) 2 )‘ /2 , 
doo((a, x), ( b , y)) = ma x{d(a, b), p( x, v)}. 

Henceforth, any implicit reference to “the” metric on M x N , sometimes called the 
product metric, will mean one of d\,d 2 , or d 0 Q . Any one of them will serve equally 
well; use whichever looks most convenient for the argument at hand. 


While we are not yet ready for an all-out attack on continuity, it couldn't hurt to give 
a hint as to what is ahead. Given a function / ; (A/, d ) —*■ ( N , p ) between two metric 
spaces, and given a point x e M, we have at least two plausible sounding definitions 



Notes and Remarks 


49 


for the continuity of / at x. Each definition is derived from its obvious counterpart for 
real-valued functions by replacing absolute values with an appropriate metric. 

For example, we might say that / is continuous at x if p(f(x n ), /Or)) -»• 0 
whenever d(x„, x) — ► 0. That is, / should send sequences converging to x into se- 
quences converging to f(x). This says that /“commutes” with limits: /(lim,,.*,*,*,,) = 
lim„_ 00 /(jc n ). Sounds like a good choice. 

Or we might try doctoring the familiar e-8 definition from a first course in calculus. 
In this case we would say that / is continuous at x if, given any e > 0, there always 
exists a 8 > 0 such that p(f(x),f(y)) < e whenever d(x, y) < 8. Written in slightly 
different terms, this definition requires that / (Bf 00 ) C B£{f(x)). That is, / maps a 
sufficiently small neighborhood of x into a given neighborhood of /(or). 

We will rewrite the definition once more, but this time we will use an inverse image. 
Recall that the inverse image of a set A, under a function / : X -*■ Y, is defined to be 
the set {x € X : fix) € A) and is usually written f~'(A). (The inverse image of any set 
under any function always makes sense. Although the notation is similar, inverse images 
have nothing whatever to do with inverse functions, which don’t always make sense.) 
Stated in terms of an inverse image, our condition reads: Bf(x) c /“' ( fif (/(*))). 
Look a bit imposing? Well, it actually tells us quite a bit. It says that the inverse image 
of a “thick” set containing f(x) must still be “thick” near x. Curious. Figure 3.2 may 
help you with these new definitions. Better still, draw a few pictures of your own! 



This sets the stage for what is ahead. Each of the two possible definitions for conti- 
nuity seems perfectly reasonable. Certainly we would hope that the two turn out to be 
equivalent. But what do convergent sequences have to do with “thick” sets? And just 
what is a “thick” set anyway? 


Notes and Remarks 

The quotation at the start of this chapter is taken from Fr^chet [1950]; his thesis 
appears in Fr6chet [1906]. His book, Frechet [1928], was published as one of the 
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volumes in a series of monographs edited by Emile Borel. The authors in this series in- 
clude every “name” French mathematician of that time: Baire, Borel, Lebesgue, L6vy, 
de La Valine Poussin, and many others. The full title of Fr6chet’s book, including subti- 
tle, is enlightening: Les espaces abstraits et leur theorie consideree comme introduction 
a 1‘ analyse generate (Abstract spaces and their theory considered as an introduction to 
general analysis). The paper by Riesz mentioned in the introductory passage is Riesz 
[1906]. 

It was Hausdorff who gave us the name “metric space.” Indeed, his classic work 
Grundziige der Mengenlehre, Leipzig, 1914, is the source for much of our terminology 
regarding abstract sets and abstract spaces. An English translation of Hausdorff’s book 
is available as Set Theory (Hausdorff [1937]). If we had left it up to Fr6chet, we would 
be calling metric spaces “spaces of type (D).” 

For more on metric spaces, normed spaces, and R", see Copson [1968], Goffman and 
Pedrick [1965], Goldberg [1976], Hoffman [1975], Kaplansky [1977], Kasriel [1971], 
Kolmogorov and Fomin [1970], and Kuller [1969]. For a look at modem applications 
of metric space notions, see Barnsley [ 1988] and Edgar [1990]. 

Normed vector spaces were around for some time before anyone bothered to for- 
malize their definition. Quite often you will see the great Polish mathematician Stefan 
Banach mentioned as the originator of normed vector spaces, but this is only partly true. 
In any case, it is fair to say that Banach gave the first thorough treatment of normed 
vector spaces, beginning with his thesis (Banach [ 1 922]). We will have cause to mention 
Banach’s name frequently in these notes. 

The several “name” inequalities that we saw in this chapter are, for the most part, 
older than the study of norms and metrics. Most fall into the category of “mean values” 
(various types of averages). An excellent source of information on inequalities and 
mean values of every shape and size is a dense little book with the apt title Inequalities , 
by Hardy, Littlewood, and P61ya [1952], Beckenbach and Bellman [1961] provide an 
elementary introduction to inequalities, including a few applications. For a very slick, 
yet elementary proof of the inequalities of Holder and Minkowski, see Maligranda 
[1995], 

Certain applications to numerical analysis and computational mathematics have 
caused a renewed interest in mean values. For a brief introduction to this exciting area, 
see the selection “On the arithmetic-geometric mean and similar iterative algorithms” 
in Schoenberg [1982], and the articles by Almkvist and Bemdt [1988], Carlson [1971], 
and Miel [1983]. For a discussion of some of the computational practicalities, see 
D. H. Bailey [1988], 



CHAPTER FOUR 


Open Sets and Closed Sets 


Open Sets 

One of the themes of this (or any other) course in real analysis is the curious interplay 
between various notions of “big” sets and “small” sets. We have seen at least one such 
measure of size already: Uncountable sets are big, whereas countable sets are small. In 
this chapter we will make precise what was only hinted at in Chapter Three - the rather 
vague notion of a “thick” set in a metric space. For our purposes, a “thick" set will 
be one that contains an entire neighborhood of each of its [>oints. But perhaps we can 

come up with a better name Throughout this chapter, unless otherwise specified, 

we live in a generic metric space (Af , d ). 

A set U in a metric space (A/, d ) is called an open set if U contains a neighborhood 
of each of its points. In other words, U is an open set if, given x e U, there is some 
e > 0 such that B f (x) c U. 

Examples 4.1 

(a) In any metric space, the whole space M is an open set. The empty set 0 is also 
open (by default). 

(b) In R, any open interval is an open set. Indeed, given * e (a. b), let e = min 
{x - a,b - jc}. Then, e > 0 and (x - e, x + e) c (a, b). The cases (a, oo) and 
( — oo, b) are similar. While we’re at it, notice that the interval [0, 1 ), for example, 
is not open in R because it does not contain an entire neighborhood of 0. 

(c) In a discrete space, B|(x) = (jc) is an open set for any x. (Why?) It follows that 
every subset of a discrete space is open. 

Before we get too carried away, we should follow the lead suggested by our last two 
examples and check that every open ball is in fact an open set. 

Proposition 4.2. Foranyx € M and any e > 0, the open ball B e (x) is an open set. 

proof. Let y e B t (x). Then d(x, y) < e and hence 5 = e - d(x, y) > 0. We 

will show that Bgiy) C B c (x) (as in Figure 4.1). Indeed, if d(y,z) < &, then, 

by the triangle inequality, d(x, z ) < d{x, y) + d(y, z) < d(x, y) + S = d(x, y) + 

£ — d( x, y) = £. □ 

Let’s collect our thoughts. First, every open ball is open. Next, it follows from the 
definition of open sets that an open set must actually be a union of open balls. In fact. 
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if U is open, then U = U{B £ (x) : B e (x) c U). Moreover, any arbitrary union of open 
balls is again an open set. (Why?) Here’s what all of this means: 

Theorem 4.3. An arbitrary union of open sets is again open; that is, if ( U„) ae A 
is any collection of open sets, then V = (J aeA U a is open. 

proof. If jr e V, then x e U a for some a € A. But then, since U a is open, 
B c (x) C U a C V for some e > 0. □ 

Intersections aren’t nearly as generous: 

Theorem 4.4. A finite intersection of open sets is open; that is, if each of 
U\ U„ is open, then so is V = U\ n • • • n U„. 


proof. If x e V, then jc e t/, for all / = 1 n. Thus, for each / there is an 

Si > 0 such that B s (x) c U t . But then, setting e = min{g| e„} > 0, we have 

B ( wcn: =l B £ ,wcn; =1 f/, = v. □ 

Example 4i 

The word “finite” is crucial in Theorem 4.4 because (-1/n, \/n) = (0},and 
{0} is not open in R. (Why?) 

Now, since the real line R is of special interest to us, let’s characterize the open 
subsets of R. This will come in handy later. But it should be stressed that while this 
characterization holds for R, it does not have a satisfactory analogue even in R 2 . (As 
we will see in Chapter Six, not every open set in the plane can be written as a union of 
disjoint open disks.) 

Theorem 4.6. If U is an open subset of R, then U may be written as a countable 
union of disjoint open intervals. That is, U = |J£1, /„, where I„ = ( a„ , b„) ( these 
may be unbounded ) and /, n /„ = 0 for n m. 

proof. We know that U can be written as a union of open intervals (because 
each x € U is in some open interval / with l c U). What we need to show is 
that U is a union of disjoint open intervals - such a union, as we know, must be 
countable (see Exercise 2.15). 

We first claim that each x € U is contained in a maximal open interval I x c U 
in the sense that if x e / c U, where / is an open interval, then we must have 
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/ c l x - Indeed, given x e U, let 

a x = inf{a : (a, x] c U } and b x = sup{(> : [x, b) c U). 

Then, l x = (a v , b x ) satisfies x e l x cU, and l x is clearly maximal. (Check this!) 

Next, notice that for any x,y e U we have either l x n l y = 0 or l x = l y . Why? 
Because if l x n l y ^ 0, then l x U l y is an open interval containing both l x and 
l y . By maximality we would then have l x = I y . It follows that U is the union of 
disjoint (maximal) intervals: U = U^et/ f** ^ 

Now any time we make up a new definition in a metric space setting, it is usually 
very helpful to find an equivalent version stated exclusively in terms of sequences. To 
motivate this in the particular case of open sets, let’s recall: 

x n —*■ x «=► (x„) is eventually in fi e (x), for any e > 0 

and hence 


x n —* x <=► (x„) is eventually in U. for any open set U containing x. 
(Why?) This last statement essentially characterizes open sets: 

Theorem 4.7. A set U in (M . d ) is open if and only if, whenever a sequence 
(x„) in M converges to a point x € U, we have x„ e U for all but finitely 
many n. 

proof. The forward implication is clear from the remarks preceding the theorem. 
Let’s see why the new condition implies that U is open: 

If U is not open, then there is an x € U such that B e (x) r\U c / 0 for all e > 0. 

In particular, for each n there is some x„ e B\ /n (x) n U c . But then (x„) c U c and 
x„ — ► x. (Why?) Thus, the new condition also fails. □ 

In slightly different language, Theorem 4.7 is saying that the only way to reach a 
member of an open set is by traveling well inside the set; there are no inhabitants on 
the “frontier.” In essence, you cannot visit a single resident without seeing a whole 
neighborhood! 


Closed Sets 

What good would “open” be without “closed”? A set F in a metric space (A/, d) is said 
to be a closed set if its complement F c = M \ F is open. 

We can draw several immediate (although not terribly enlightening) conclusions: 

Examples 4.8 

(a) 0 and M are always closed. (And so it is possible for a set to be both open and 
closed!) 

(b) An arbitrary intersection of closed sets is closed. A finite union of closed sets is 
closed. 
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(c) Any finite set is closed. Indeed, it is enough to show that {at} is always closed. 
(Why?) Given any y e M \ {*) (that is, any y # jc), note that e = d(x, y) > 0, 
and hence B e (y) c M \ {*}. 

(d) In R, each of the intervals [a, b], [a, oo), and (- 00 , b] is closed. Also, N and A 
are closed sets. (Why?) 

(e) In a discrete space, every subset is closed. 

(f) Sets are not “doors”! (0, 1] is neither open nor closed in R! 

As yet, our definition is not terribly useful. It would be nice if we had an intrinsic 
characterization of closed sets - something that did not depend on a knowledge of 
open sets - something in terms of sequences, for example. For this let’s first make an 
observation: F is closed if and only if F c is open, and so F is closed if and only if 

x € F c => B,(x) C F c for some e > 0. 

But this is the same as saying: F is closed if and only if 

B e (x) n F 5 * 0 for every e > 0 ==> x e F. (4.1) 

This is our first characterization of closed sets. (Compare this with the phrase “F is not 
open," as in the proof of Theorem 4.7. They are similar, but not the same!) 

Notice, please, that if x e F, then B e (x) n F / 0 necessarily follows; we are inter- 
ested in the reverse implication here. In general, a point x that satisfies B e ( x) n F / 0 
for every e > 0 is evidently “very close” to F in the sense that x cannot be separated from 
F by any positive distance. At worst, x might be on the “boundary” of F. Thus condition 
(4. 1 ) is telling us that a set is closed if and only if it contains all such “boundary” points. 
Exercises 33, 40, and 41 make these notions more precise. For now, let’s translate 
condition (4. 1 ) into a sequential characterization of closed sets. 

Theorem 4.9. Given a set F in (M , d ), the following are equivalent: 

(i) F is closed: that is, F c = M \ F is open. 

(ii) If B e (x) n F / 0 for every e > 0, then x e F. 

(iii) If a sequence (x„) c F converges to some point x e M, then x e F. 

proof, (i) (ii): This is clear from our observations above and the definition 

of an open set. 

(ii) => (iii): Suppose that (x n ) c F and x„ -► x € M. Then B t (x) contains 
infinitely many x„ for any e > 0, and hence B e (x) n F # 0 for any e > 0. Thus 
x € F, by (ii). 

(iii) => (ii): If B e (x) n F ^ 0 for all e > 0, then for each n there is an 
jc„ e Bi /n (x) n F. The sequence (x„) satisfies (x„) c F and x„ -*• x. Hence, by 
(iii), x € F. □ 

Condition (iii) of Theorem 4.9 is just a rewording of our sequential characterization 
of open sets (Theorem 4.7) applied to U = F c . Most authors take (iii) as the definition 
of a closed set. In other words, condition (iii) says that a closed set must contain all of 
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its limit points. That is, “closed” means closed under the operation of taking of limits. 
(Exercise 33 explores a slightly different, but more precise, notion of limit point.) 


EXERCISES 

1. Show that an “open rectangle” (a, b) x (c, d) is an open set in R 2 . More generally, 
if A and B are open in R, show that A x B is open in R 2 . If A and B are closed in 
R, show that A x B is closed in R 2 . 

2. If F is a closed set and G is an open set in a metric space M , show that F \ G 
is closed and that G\F is open. 

> 3. Some authors say that two metrics d and p on a set M are equivalent if they 
generate the same open sets. Prove this. (Recall that we have defined equivalence to 
mean that d and p generate the same convergent sequences. See Exercise 3.42.) 

4. Prove that every subset of a metric space M can be written as the intersection of 
open sets. 

> 5. Let / : R — ► R be continuous. Show that {jc : /(j t) > 0} is an open subset of 
R and that (jc : /(jc) = 0} is a closed subset of R. 

6. Give an example of an infinite closed set in R containing only irrationals. Is there 
an open set consisting entirely of irrationals? 

7. Show that every open set in R is the union of (countably many) open intervals 
with rational endpoints. Use this to show that the collection U of all open subsets of 
R has the same cardinality as R itself. 

> 8. Show that every open interval (and hence every open set) in R is a countable union 
of closed intervals and that every closed interval in R is a countable intersection of 
open intervals. 

9. Let d be a metric on an infinite set M. Prove that there is an open set U in M 

such that both U and its complement are infinite. [Hint: Either (Af , d) is discrete or 
it’s not ] 

10. Given y = (y„) € Z/ 00 , N € N, and e > 0, show that { x = (jc„) € H°° : 
|jc* — y*| < £, k = I, ...» Af) is open in H 00 (see Exercise 3.10). 

> 11. Let e {k) = (0, . . . , 0, 1, 0, . . .), where the kth entry is 1 and the rest are 0s. 
Show that {e (k) : k > 1 } is closed as a subset of t \ . 

12. Let F be the set of all x e too such that jc„ = 0 for all but finitely many n. Is 
F closed? open? neither? Explain. 

13. Show that c*o is a closed subset of t 0 c . [Hint: If (x in) ) is a sequence (of sequences!) 
in Co converging to jc € €<», note that |jc*| < |jc* — Jcj[ n> | + |jc{ w> | and now choose n 
so that |jc* — jc^ n) | is small independent of £.] 

14. Show that the set A = {jc € 1 2 : |^r„ | < \/n, n = 1 , 2, . . . } is a closed set 
in I 2 but that B = {* e £2 : |a:„| < 1 /n, n — 1 , 2, . . . } is not an open set. [Hint: 
Does B D B t { 0)?] 
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Now, as we’ve seen, some sets are neither open nor closed. However, it is possible 
to describe the “open part” of a set and the “closure” of a set. Here’s what we’ll do: 
Given a set A in (M , d ), we define the interior of A, written int(A) or A 0 , to be the 
largest open set contained in A. That is, 

int(/4) = A° = [J{1/ : V is open and U C A] 

= [J{B £ (;t) : B c (x) C A for some x € A, e > 0} (Why?) 

= {jt e A : B s {x) c A for some e > 0}. 

Note that A° is clearly an open subset of A. 

We next define the closure of A, written cl(>4) or A , to be the smallest closed set 
containing A. That is, 

cl(A) = A = : F is closed and A c F). 

Please take note of the “dual” nature of our two new definitions. 

Now it is clear that A is a closed set containing A - and necessarily the smallest 
one. But it’s not so clear which points are in A or, more precisely, which points are 
in A \ A. We could use a description of A that is a little easier to “test” on a given set 
A. It follows from our last theorem that x € A if and only if B e (x) ft A ^ 0 for every 
e > 0. The description that we are looking for simply removes this last reference to A. 

Proposition 4.10. x € A if and only if B e (x) C\ A j^0 for every e > 0. 

proof. One direction is easy: If B £ (x)n.A # 0 for every e > 0, then B e (x)DA / 

0 for every e > 0, and hence x e A by Theorem 4.9. 

Now, for the other direction, let x € A and let e > 0. If B e (x) n A = 0, then 
A is a subset of (fi E (x)) c , a closed set. Thus, A c (B £ (x)) c . (Why?) But this is a 
contradiction, because x € A while x £ (fl £ (jt))\ □ 

Corollary 4.11. xeA if and only if there is a sequence (x„) C A with x„ -*■ x. 

That is, A is the set of all limits of convergent sequences in A (including limits of 
constant sequences). 

Example 4.12 

Here are a few easy examples in R. (Check the details!) 

(a) int((0, 1]) = (0, 1) and cl((0, 1 ]) = [0, 1 ], 

(b) int({(l//i) : n > 1}) = 0 and cl({(l/n) : n > 1)) = {(1 /n) :n> 1} U {0}, 

(c) int(Q) = 0 and cl(Q) = R, 

(d) int(A) = 0 and cl(A) = A. 


EXERCISES 

Unless otherwise specified, each of the following exercises is set in a generic metric 
space ( M , d ). 
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15. The set A = {y € Af : d(x , y) < r } is sometimes called the c/osed ball about 
jc of radius r. Show that A is a closed set, but give an example showing that A need 
not equal the closure of the open ball B r (x). 

16. If (V, || - 1|) is any normed space, prove that the closed ball [x € V : \\x || < 1 } 
is always the closure of the open ball {x e V : \\x\\ < l }. 

> 17. Show that A is open if and only if A° = A and that A is closed if and only if 
A = A. 

> 18. Given a nonempty bounded subset E of R, show that sup E and inf E are 
elements of E. Thus sup E and inf E are elements of E whenever E is closed. 

> 19. Show that diam (A) = diam(A ). 

20. If A C B , show that A C B. Does A C B imply A C B! Explain. 

21. If A and B are any sets in Af , show that A U B = A U B and A H B C A H B. 
Give an example showing that this last inclusion can be proper. 

22. True or false? (A U B)° = A° U B°. 

23. If x ^ y in A/, show that there are disjoint open sets U , V with x € U and 
y e V. Moreover, show that U and V can be chosen so that even 0 and V are 
disjoint. 

24. Show that A = (int (A c )) c and that A° = (cl (A c )) c . 

25. A set that is simultaneously open and closed is sometimes called a dopen set. 
Show that R has no nontrivial clopen sets. [Hint: If U is a nontrivial open subset of 
R, show that 0 is strictly bigger than U.] 

26. We define the distance from a point jc 6 Af to a nonempty set A in Af by 
d(x , A) = inf{rf(jc, a) : a € A). Prove that d(x , A) = 0 if and only ifxeA. 

27. Show that \d(x> A) — rf(y, A)| < d(x y y ) and conclude that the map x h* 
d( jc, A) is continuous. 

28. Given a set A in Af and e > 0, show that {jc € Af : d(x f A) < £} is an open 
set and that {jc € Af : d(x, A) < £} is a closed set (and each contains A). 

29. Show that every closed set in Af is the intersection of countably many open sets 
and that every open set in Af is the union of countably many closed sets. [Hint: What 
is nZAxeM :d(x,A)<(l/n))‘>) 

30. 

(a) For each n € Z, let F n be a closed subset of (n , n + 1 ). Show that F = U„€Z Fn 
is a closed set in R. [Hint: For each fixed n, first show that there is a S„ > 0 so 
that |jc — y| > S„ whenever x € F n and y e F my m ^ n.] 

(b) Find a sequence of disjoint closed sets in R whose union is not closed. 

31. If x F y where F is closed, show that there are disjoint open sets U , V with 
x € U and F C V. (This extends the first result in Exercise 23 since (y) is closed.) 
Is it possible to find U and V so that 0 and V are disjoint? Is it possible to extend 
this result further to read: Any two disjoint closed sets are contained in disjoint open 
sets? 
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32. We define the distance between two (nonempty) subsets A and B of M by 
d(A, B) = inf{c/(a, b) : a e A y b e B}. Give an example of two disjoint closed 
sets A and B in R 2 with d(A y B) = 0. 

> 33. Let A be a subset of M. A point x e M is called a limit point of A if every 
neighborhood of x contains a point of A that is different from jc itself, that is, if 
(fl e (jc) \ {jc}) H A ± 0 for every £: > 0. If jc is a limit point of A, show that every 
neighborhood of x contains infinitely many points of A. 

> 34. Show that x is a limit point of A if and only if there is a sequence ( x n ) in A 
such that x n — ► x and x n ^ x for all n. 

35. Let A' be the set of limit points of a set A. Show that A' is closed and that A = 
A' U A. Show that A' C A if and only if A is closed. (A' is called the derived set 
of A.) 

36. Suppose that x n - 4 - x € A/, and let A = {jc} U {jc n : n > 1}. Prove that A is 
closed. 

37. Prove the Bolzano- Weierstrass theorem: Every bounded infinite subset of R 
has a limit point. [Hint: Use the nested interval theorem. If A is a bounded infinite 
subset of R, then A is contained in some closed bounded interval I\. At least one of 
the left or right halves of I\ contains infinitely many points of A. Call this new closed 
interval / 2 . Continue.] 

38. A set P is called perfect if it is empty or if it is a closed set and every point of P 
is a limit point of P. Show that A is perfect. Show that R is perfect when considered 
as a subset of R 2 . 

39. Show that a nonempty perfect subset P of R is uncountable. This gives yet 
another proof that the Cantor set is uncountable. [Hint: First convince yourself that 
P is infinite, and assume that P is countable, say P = [x\ % jc 2 , . . .}. Construct a 
decreasing sequence of nested closed intervals [a ny b n ] such that (a n ,b n )C\ P ^ 0 
but x n £ [a n <b n ]. Use the nested interval theorem to get a contradiction.] 

40. If jc € A and jc is not a limit point of A, then .r is called an isolated point of A. 
Show that a point jc e A is an isolated point of A if and only if (Z?*(jc)\ {jc})DA = 0 
for some e > 0. Prove that a subset of R can have at most countably many isolated 
points, thus showing that every uncountable subset of R has a limit point. 

41. Related to the notion of limit points and isolated points are boundary points. A 
point jc € M is said to be a boundary point of A if each neighborhood of jc hits 
both A and A c . In symbols, jc is a boundary point of A if and only if fi £ (jc) (T A ^ 0 
and B e (x) fl A c ^ 0 for every e > 0. Verify each of the following formulas, where 
bdry( A) denotes the set of boundary points of A : 

(a) bdry(A) = bdry(A r ), 

(b) cl(A) = bdry(A) U int(A), 

(c) M = int(A) U bdry(A) U int(A r ). 

Notice that the first and last equations tell us that each set A partitions M into three 
regions: the points “well inside” A, the points “well outside” A, and the points on the 
common boundary of A and A c . 
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42. Iff is a nonempty bounded subset of R, show that sup E and inf E are both 
boundary points of E. Hence, if E is also closed, then sup E and inf E are elements 
of E. 

43. Show that bdry(A) is always a closed set; in fact, bdry(A) = A \ A°. 

44. Show that A is closed if and only if bdry(A) C A. 

45. Give examples showing that bdry(A) = 0 and bdry(A) = M are both pos- 
sible. 

> 46. A set A is said to be dense in M (or, as some authors say, everywhere dense) if 
A = M. For example, both Q and R \ Q are dense in R. Show that A is dense in M 
if and only if any of the following hold: 

(a) Every point in M is the limit of a sequence from A. 

(b) B t (x) H A # 0 for every x e M and every e > 0. 

(c) U f! A ^ 0 for every nonempty open set U. 

(d) A c has empty interior. 

47. Let G be open and let D be dense in M . Show that G H D = G. Give an 
example showing that this equality may fail if G is not open. 

o 48. A metric space is called separable if it contains a countable dense subset. Find 
examples of countable dense sets in R, in R 2 , and in R\ 

49. Prove that t 2 and Z/ 00 are separable. [Hint: Consider finitely nonzero sequences 
of the form (r\ , . . . , r„, 0, 0, . . .), where each r k is rational.] 

50. Show that is not separable. [Hint: Consider the set 2 N , consisting of all 
sequences of Os and Is, as a subset of We know that 2 N is uncountable. Now 
what?] 

51. Show that a separable metric space has at most countably many isolated 
points. 

52. If M is separable, show that any collection of disjoint open sets in M is at most 
countable. 

53. Can you find a countable dense subset of C[ 0, 1 ]? 

54. A set A is said to be nowhere dense in M if int (cl(A)) = 0 . Show that {*} is 
nowhere dense if and only if x is not an isolated point of M. 

55. Show that every finite subset of R is nowhere dense. Is every countable subset 
of R nowhere dense? Show that the Cantor set is nowhere dense in R. 

56. If A and B are nowhere dense in M , show that A U B is nowhere dense. Give 
an example showing that an infinite union of nowhere dense sets need not be nowhere 
dense. 

57. If A is closed, show that A is nowhere dense if and only if A c is dense if and 
only if A has an empty interior. 

58. Let (r„ ) be an enumeration of Q. For each n , let /„ be the open interval centered 

at r„ of radius 2 _n , and let U = A* - Prove that U is a proper, open, dense subset 

of R and that U c is nowhere dense in R. 
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59. If A is closed, show that bdry(A) is nowhere dense. 

60. Show that each of the following is equivalent to the statement “A is nowhere 
dense”: 

(a) A contains no nonempty open set. 

(b) Each nonempty open set in M contains a nonempty open subset that is disjoint 
from A. 

(c) Each nonempty open set in M contains an open ball that is disjoint from A. 


The Relative Metric 

Although it is a digression at this point, we need to generate some terminology for 
later use. First, given a nontrivial subset A of a metric space (A/, d ), recall that A 
“inherits” the metric d by restriction. Thus, the metric space ( A,d ) has open sets, 
closed sets, convergent sequences, and so on, of its own. How are these related to the 
open sets, closed sets, convergent sequences, and so on, of (Af , d )? The answer comes 
from examining the open balls in (A, d ). Note that for jc € A we have 

B*(x) = {a € A : d(x, a) < e} = A n {y € M : d(x, y) < e} = A n B e M (x), 

where superscripts have been used to distinguish between a ball in A and a ball in M. 
Thus, a subset G of A is open in (A, d ), or open relative to A, if, given x e G, there is 
some e > 0 such that 


G D B? (x) = A n B e M (x). 

This observation leads us to the following: 

Proposition 4.13. Let Ac M. 

(i) Aset G C A is open in (A,d ) if and only if G = AnU, where U is open in 
(M,d). 

(ii) Aset F C A is closed in (A, d) if and only if F = AHC, where C is closed 
in ( M , d ). 

(iii) cl*(£) = A nd M (£) for any subset E of A (where the subscripts distinguish 
between the closure of E in ( A,d ) and the closure of E in ( M , d )). 

proof. We will prove (i) and leave the rest as exercises. 

First suppose that G = A O U, where U is open in (Af , d ). If x e G c U, then 
x e B e M (x) c U for some e > 0. But since G c A, we have x € Afl B e M (x) = 
B*(x) c AnU = G. Thus, G is open in (A, d ). 

Next suppose that G is open in (A, d ). Then, for each x e G, there is some 
e x > 0 such that x € B*(x) = An B*f(x) c G. But now it is clear that U = 
(J{B^(x) : x € G] is an open set in (A/, d ) satisfying G = A n U. □ 

We paraphrase the statement “G is open in (A, d )” by saying that “G is open in A,” 
or “G is open relative to A,” or perhaps “G is relatively open in A.” The same goes for 
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closed sets. In the case of closures, the symbols cl*(£) are read “the closure of £ in 
A.” Another notation for cU(£) is £ A . 

Examples 4.14 

(a) Let A = (0, 1] U {2}, considered as a subset of R. Then, (0, 1] is open in A and 
{2} is both open and closed in A. (Why?) 

(b) We may consider R as a subset of R 2 in an obvious way - all pairs of the form 
(*, 0), x e R. The metric that R inherits from R 2 in this way is nothing but the 
usual metric on R. (Why?) Similarly, R 2 may be considered as a natural subset 
of R 3 (as the Ary-plane, for instance). What happens in this case? Figure 4.2 
might help. 



EXERCISES 

Throughout , M denotes an arbitrary metric space with metric d. 
t> 61. Complete the proof of Proposition 4. 1 3. 

> 62. Suppose that A is open in (M, d) and that G C A. Show that G is open in A 
if and only if G is open in M . Is the result still true if “open” is replaced everywhere 
by “closed”? Explain. 

63. Is there a nonempty subset of R that is open when considered as a subset of R 2 ? 
closed? 

64. Show that the analogue of part (iii) of Proposition 4.13 for relative inte- 
riors is false. Specifically, find sets £ C A C R such that int A (£) = A while 
int R (£) = 0. 

65. Let A be a subset of M. If G and H are disjoint open sets in A, show that 
there are disjoint open sets U and V in M such that G = U D A and H = V D A. 
[Hint: Let U — [J [B^ /2 (x) : x e G and Bf(x) C G}. Do the same for V and 
H.) 

66. Let A C B C M. If A is dense in B (how would you define this?), and if B is 
dense in Af, show that A is dense in M . 

67. Let G be open and let D be dense in M. Show that G n D is dense in G. Give 
an example showing that this may fail if G is not open. 

68. If A is a separable subset of M (that is, if A has a countable dense subset of its 
own), show that A is also separable. 
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69. A collection ( U a ) of open sets is called an open base for M if every open set in 
M can be written as a union of U a . For example, the collection of all open intervals 
in R with rational endpoints is an open base for R (and this is even a countable 
collection). (Why?) Prove that M has a countable open base if and only if M is 
separable. [Hint: If {*„ } is a countable dense set in A/, consider the collection of open 
balls with rational radii centered at the Jt n .] 

0 


Notes and Remarks 

For sets of real numbers, the concepts of neighborhoods, limit points (Exercise 33), 
derived sets (Exercise 35), perfect sets (Exercise 38), closed sets, and the characteri- 
zation of open sets (Theorem 4.6) are all due to Cantor. Frdichet introduced separable 
spaces (Exercise 48). Much of the terminology that we use today is based on that used 
by either Frechet or Hausdorff. For more details on the history of these notions see 
Dudley [1989], Manheim [1964], Taylor [1982], and Willard [1970]; also see Fr6chet 
[1928], Haussdorf [1937], and Hobson [1927], 

For an alternate proof of Theorem 4.6, see Labarre [ 1965], and for more on “Cantor- 
like” nowhere dense subsets of R (as in Exercise 58), see the short note in Wilansky 
[1953b], 
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Continuous Functions 

Throughout this chapter, unless otherwise specified, (M , d) and ( N , p) are arbitrary 
metric spaces and / : M -*■ N is a function mapping M into N. We say that / is 
continuous at a point x e M if: 

{ for every e > 0, there is a S > 0 (which depends on /, x, and e) such 
that p(f(x), /(> )) < e whenever y e M satisfies d(x. y) < S. 

Recall from our earlier discussions that we may rephrase this definition (how?) to read: 

{ / is continuous at x if, for any e > 0, there is a S > 0 such that 
/ (B/(x)) c Bf(f(x)) or, equivalently, B/( x) c /*' (*?(/(*))). 

If / is continuous at every point of M, we simply say that / is continuous on M, or 
often just that / is continuous. 

By now it should be clear that any statement concerning arbitrary open balls will 
translate into a statement concerning arbitrary open sets. Thus, there is undoubtedly 
a characterization of continuity available that may be stated exclusively in terms of 
open sets. Of course, any statement concerning open sets probably has a counterpart 
using closed sets. And don’t forget sequences! Open sets and closed sets can each 
be characterized in terms of convergent sequences, and so we would expect to find a 
characterization of continuity in terms of convergent sequences, too. At any rate, we’ve 
done enough hinting around about reformulations of the definition of continuity. It’s 
time to put our cards on the table. 

Theorem 5.1. Given f : (A/, d ) — ► ( N , p), the following are equivalent: 

(i) / is continuous on M (by the e-S definition). 

(ii) For any x € Af, if x„ -*■ x in M, then f(x„) — ► f(x) in N. 

(iii) If E is closed in N, then f ] (E) is dosed in M. 

(iv) If V is open in N, then f~'(V) is open in M. 

proof, (i) ==> (ii): (Compare this with the case / : R -*■ R.) Suppose that 
x„ —> x. Given e > 0, let S > 0 be such that /(£?/( jc)) c (/(*)). Then, since 
x„ -4 x , we have that (x„) is eventually in B s d (x). But this implies that (f(x„)) is 
eventually in flf (/(*)). Since e is arbitrary, this means that f(x n ) -4 / (x). 
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(ii) => (iii): Let E be closed in (N, p). Given (x„) c f~\E) such that x„ -4 
x e Af, we need to show that x e /“'(£). But (jt„) c /“'(£) implies that 
(/C* 7 i)) C E , while x n -4 x € Af tells us that /( x„) -4 / (x) from (ii). Thus, since 
E is closed, we have that f(x) e E or x e /“'(£). 

(iii) (iv) is obvious, since f~'(A c ) = (/ -, (A))‘. See Exercise 1. 

(iv) =^(i): Given x € A/ande > 0,thesetRf(/(x))isopenin(/V, p) and so, by 
(iv), the set /-' (Bf(f(x))) is open in (A/, d ). But then Bf(x) c /"' (**(/(*))), 
for some S > 0, because x e f ~ 1 (Bf(/(x))). □ 

Example 5.2 

(a) Define Xq : R -*■ R by Xq(x) = 1, if x e Q, and Xq(jc) = 0, if x i Q. Then, 
Xq'(B,/3(1)) = Q ^ ^q'(5i/3(0)) = R \ Q. Thus Xq cannot be continuous at 
any point of R because neither Q nor R \ Q contains an interval. 

(b) A function / : M -*■ N between metric spaces is called an isometry (into) if 

/ preserves distances: p(f(x), f(y)) = d(x, y) for all x, y e Af. Obviously, an 
isometry is continuous. The natural inclusions from R into R 2 (i.e., jc i-> (x, 0) ) 
and from R 2 into R 3 (this time (x, y) t-> (x, 0)) are isometries. (Why?) 

(c) Let / : N -► R be any function. Then / is continuous! Why? Because (n) is an 
open ball in N. Specifically, {«) = B\/i(n) c f~' ( B e (f(n))) for any £ > 0. 

(d) / : R ->• N is continuous if and only if / is constant ! Why? [Hint: See Exercise 
4.25.] 

(e) Relative continuity can sometimes be counterintuitive. From (a) we know that 
X Q has no points of continuity relative to R, but the restriction of Xq to Q is 
everywhere continuous relative to Q! Why? (See Exercise 9 for more details.) 

(f ) If y is any fixed element of (Af, d ), then the real-valued function fix) = d(x, y) 
is continuous on Af . As we will see, even more is true (see Exercises 20 and 34). 


EXERCISES 

Throughout , Af denotes an arbitrary metric space with metric d. 

> 1. Given a function / : S -*■ T and sets A, B C S and C, D C T, establish the 
following: 

(i) Ac/ -1 (/(A)), with equality for all A if and only if / is one-to-one. 

(ii) / (/~'(C)) C C, with equality for all C if and only if / is onto. 

(iu) /(A U B) = /(A) U f(B). 

(iv) /-'(CUD) = /- , (C)U/-'(D). 

(v) /(A D B) C /(A) fl f(B), with equality for all A and B if and only if / is 
one-to-one. 

(vi) f-\CDD) = f-'(C)nf~'(D). 

(vii) /(A) \ f(B) C /(A \ B). 

(v«l) /-'(C\D) = /-'(C)\/- 1 (D). 

Generalize, wherever possible, to arbitrary unions and intersections. 
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> 2. Given a subset A of some “universal” set S, we define X A : S — ► R, the charac- 
teristic function of A, by X A (x) = 1 if x € A and X^(.t) = 0 if x £ A. Prove or 
disprove the following formulas: X^u* = X A + X*, X AnB = X A X By X A \ B = 
X A — X B . What corrections are necessary? 

3. If / : A — ► B and C C B, what is X c o / (as a characteristic function)? 

4. Show that X A : R — ► R, the characteristic function of the Cantor set, is discon- 
tinuous at each point of A. 

5. Is there a continuous characteristic function on R? If A C R, show that X A is 
continuous at each point of int (A). Are there any other points of continuity? 

6. Let / : R -► R be continuous. Show that {jc : /(jc) > 0) is an open subset of R 
and that {jc : /( jc) = 0} is a closed subset of R. If /(jc) = 0 whenever jc is rational, 
show that /(jc) = 0 for every real x. 

7. 

(a) If / : M R is continuous and a € R, show that the sets (jc : /(jc) > a] and 
{jc : /(jc) < a] are open subsets of M. 

(b) Conversely, if the sets {jc : /(jc) > a] and {jc : /(jc) < a] are open for every 
a e R, show that / is continuous. 

(c) Show that / is continuous even if we assume only that the sets {jc : /(jc) > a } 
and {jc : /(jc) < a] are open for every rational a. 

o 8. Let / : R -> R be continuous. 

(a) If /( 0) > 0, show that /(jc) > 0 for all jc in some interval (—a, a ). 

(b) If /(jc) > 0 for every rational jc, show that /(jc) > 0 for all real jc. Will this 
result hold with “>0” replaced by “>0”? Explain. 

>9. Let A C M . Show that / : (A , d ) — ► ( N, p ) is continuous at a € A if and only 
if, given e > 0, there is a S > 0 such that p(f (jc), f(a))<e whenever c/(jc, a) < S 
and jc € A. We paraphrase this statement by saying that “/ has a point of continuity 
relative to A.” 

10. Let A = (0, 1 ] U {2}, considered as a subset of R. Show that every function 
/ : A — ► R is continuous, relative to A, at 2. 

11. Let A and B be subsets of Af, and let / : M R. Prove or disprove the 
following statements: 

(a) If / is continuous at each point of A and / is continuous at each point of B , then 
/ is continuous at each point of A U B. 

(b) If / \ A is continuous, relative to A and / |^ is continuous, relative to fi, then 
/ Uua is continuous, relative to A U fi. 

If either statement is not true in general, what modifications are necessary to make 
it so? 

12. Let / = (R \ Q) n [ 0, 1 ] with its usual metric. Prove that there is a continuous 
function g mapping I onto Q fl [ 0, 1 ]. 

13. Let (r„) be an enumeration of the rationals in [ 0, 1 ] and define / on [ 0, 1 ] by 
/(jc) = Yl rm<x 2~ n . Show that / is everywhere discontinuous on [0, 1 ] but that / 
is everywhere continuous when considered as a function on only [0, 1 ] \ Q. 
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14. A continuous function on R is completely determined by its values on Q. Use 
this to “count” the continuous functions / : R -► R. 

15. Suppose that / : R — ► R satisfies /(jc + y) = /(jc) + /(y) for every x , 
y € R. If / is continuous at some point .to € R, prove that there is some constant 
a € R such that f(ax) = ax for all x e R. That is, an additive function that is 
continuous at even one point is linear - and hence continuous on all of R. 

16. Let / : R — ► R, and define G : R — ► R 2 by G(jc) = (jc, /(jc)), so that the 
range of G is the graph of /. Show that / is continuous if and only if G is continuous 
if and only if both of the sets A = {(jc, y) : y < /(jc)} and B = {(jc, y) : >’ > /(jc)} 
are closed in R 2 . In particular, if / is continuous, then the graph of / is closed in R 2 . 

> 17. Let /, g : (Af , d ) — ► (N, p ) be continuous, and let D be a dense subset of Af . 
If f(x) = g(jc) for all jc 6 D, show that /(jc) = g(jc) for all jc e Af . If / is onto, 
show that /(D) is dense in N. 

18. Let / : (Af , d) (N , p) be continuous, and let A be a separable subset of 
Af . Prove that f(A) is separable. 

> 19. A function / : R — ► R is said to satisfy a Lipschitz condition if there is a 
constant < oo such that |/(jc) — /(y)| < AT |jc — y\ for all jc, y € R. More econo- 
mically, we may say that / is Lipschitz (or Lipschitz with constant K if a particular 
constant seems to matter). Show that sin jc is Lipschitz with constant K = 1 . Prove 
that a Lipschitz function is (uniformly) continuous. 

> 20. If d is a metric on Af , show that |</(jc, z) - d(y , z)| < d(: c, y) and conclude 
that the function / (jc) = d(: c, z) is continuous on Af for any fixed z € Af . This says 
that d(j c, y) is separately continuous - continuous in each variable separately. 

21. If jc y in Af , show that there are disjoint open sets D, V with jc € U and 
v € V. Moreover, U and V can be chosen so that 0 and V are disjoint. 

22. Define E : N — ► t\ by E(n) = ( 1 , . . . , 1,0,...), where the first n entries are 
1 and the rest 0. Show that £ is an isometry (into). 

23. Define S : Co — ► cq by S(jci , JC 2 , . . . ) = (0, jci , JC 2 , . . . )• That is, S shifts the 
entries forward and puts 0 in the empty slot. Show that S is an isometry (into). 

24. Let V be a normed vector space. If y e V is fixed, show that the maps anay, 
from R into V, and jc jc + y, from V into V, are continuous. 

> 25. A function / : (Af , d ) — ► (N, p ) is called Lipschitz if there is a constant 
K < oo such that p(/(jc), /(y)) < Kd{: c, y) for all jc, y e Af. Prove that a 
Lipschitz mapping is continuous. 

26. Provide the answer to a question raised in Chapter Three by showing that inte- 
gration is continuous. Specifically, show that the map L(f) = f a f(t)dt is Lipschitz 
with constant K = b — a for / € C[a, b]. 

27. Fix it > 1 and define / : £<» -► R by f(x) = jc*. Is / continuous? [Hint: / 
is Lipschitz.] 

28. Define g : i 2 -+ Rb y g(jc) = jc„//i. Is g continuous? 

29. Fix y € €oc and define h : l\ i\ by h(x) = (jc„y„)^i,. Show that h is 
continuous. 
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> 30. Let / : (Af , d ) — ► ( N , p ). Prove that / is continuous if and only if / (A ) C 
/(A) for every A C Af if and only if /”'( B ° ) C for every B C N. 

Give an example of a continuous / such that / (A ) ^ f(A) for some A C Af . 

31. Let/:(Af,<f)->- (N,p). 

(a) If Af = U^=i where each U„ is an open set in Af, and if / is continuous on 
each U„, show that / is continuous on Af . 

(b) If Af = (Jl, £/i> where each E„ is a closed set in Af . and if / is continuous on 
each E n , show that / is continuous on Af . 

(c) Give an example showing that / can fail to be continuous on all of Af if, instead, 
we use a countably infinite union of closed sets Af = U<^i £n ' n (&)• 

32. A real-valued function / on a metric space Af is called lower semicontinuous if, 
for each real a, the set {.v € Af : f(x) < or } is closed in Af. (For example, if g : Af —► 
K is continuous and jr 0 € Af, then the function /defined by f(x) = g(jt)for* ^ *o, 
and / (-Co) = g(*o) — 1 is lower semicontinuous.) Prove that / is lower semicontinu- 
ous if and only if / (jc) < lim infn_.no f(*n) whenever x n —y x in Af . [Hint: For the 
forward implication, suppose that x n — ► x and m = lim inf„_ao f(*n) < 00 ■ Then, 
for every e > 0, the set {/ € Af : /(/) < m + £} is closed and contains infinitely 
many *„.] 

33. A function / : Af R is called upper semicontinuous if — / is lower semi- 
continuous. Formulate the analogue of Exercise 32 for upper semicontinuous func- 
tions. 


Theorem 5. 1 characterizes continuous functions in terms of open sets and closed sets. 
As it happens, we can use these characterizations “in reverse” to derive information 
about open and closed sets. In particular, we can characterize closures in terms of certain 
continuous functions. 

Given a nonempty set A and a point x e Af , we define the distance from x to A by: 
d( x. A) = inf [d(x, a) : a € A}. 

Clearly, 0 < d{x. A) < oo for any x and any A, but it is not necessarily true that 
d( x. A) > 0 when x i A. For example, d(x, Q) = 0 for any x e R. 

Proposition 5.3. d(x. A) = 0 if and only if x e A. 

proof. d(x. A) = 0 if and only if there is a sequence of points (a„) in A such that 
d(x .«„)—>• 0. But this means that a n -> x and, hence, .v e A by Corollary 4. 1 0. □ 

Note that Proposition 5.3 has given us another connection between limits in Af 
and limits in R. Loosely speaking, Proposition 5.3 shows that 0 is a limit point of 
[d{ x, a) : a € A} if and only if jc is a limit point of A. We can get even more mileage 
out of this observation by checking that the map x i-> d{x. A) is actually continuous. 
For this it suffices to establish the following inequality: 

Proposition 5.4. \d(x. A) — d(y, A)| < d(x . y). 
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proof. d(x, a) < d{x, y) + d{y, a) for any a e A. But d(x. A) is a lower bound 
for d(x, a); hence d(x, A) < d(x, y) + d(y, a). Now, by taking the infimum over 
a 6 A, we get d(x, A) < d(x, y) + d(y, /\). Since the roles of x and y are inter- 
changeable, we’re done. □ 


To appreciate what this has done for us, let’s make two simple observations. First, 
if / : Af -*■ R is a continuous function, then the set E = {jc € Af : f(x) = 0} is 
closed. (Why?) Conversely, if £ is a closed set in Af , then £ is the “zero set” of some 
continuous real-valued function on Af; in particular, £ = [x e Af : d(x, £) = 0}. Thus 
a set £ is closed if and only if £ = / -1 ({0}) for some continuous function f : M -*■ R. 
Conclusion: If you know all of the closed (or open) sets in a metric space Af , then you 
know all of the continuous real-valued functions on Af (Theorem 5.1). Conversely, if 
you know all of the continuous real-valued functions on Af , then you know all of the 
closed (or open) sets in Af . 


EXERCISES 

Unless otherwise stated, each of the following exercises is set in a general metric 
space (Af, d ). 

> 34. Show that d is continuous on Af x Af , where Af x Af is supplied with “the” 
product metric (see Exercise 3.46). This says that d is jointly continuous, that is, 
continuous as a function of two variables. [Hint: If jc„ — ► jc and y n — * y, show that 
d(x„,y n ) -* d(x,y).] 

35. Show that a set U is open in Af if and only if U = /“'(V) for some continuous 
function / : Af — *■ R and some open set V in R. 

> 36. Suppose that we are given a point x and a sequence (x„) in a metric space Af , 
and suppose that f(x„) -> / (x) for every continuous, real-valued function / on Af . 
Does it follow that x„ — ► x in Af ? Explain. 

37. If F is closed and x £ F, show that there are disjoint open sets U, V with 
x e U and F C V. Can U and V be chosen so that 0 and V are disjoint? 

38. Given disjoint nonempty closed sets £, F, define / : Af — ► R by /(x) = 
d(x, E)/[d(x, E) + d(x, £)]. Show that / is a continuous function on Af with 
0</< 1, / _l ({0}) = £, and / -1 ({1}) = F. Use this to find disjoint open sets U 
and V with£ C U and F C V.Can U and V be chosen so that U and V are disjoint? 

39. Show that every open set in Af is the union of countably many closed sets, and 
that every closed set is the intersection of countably many open sets. 

40. We define the distance between two nonempty subsets A and £ of Af by 
d(A , B) = inf{</(a, b) : a € A , b € B). Give an example of two disjoint closed 
sets A and B in R 2 with d(A , B) = 0. 

41. Let C be a closed set in R and let / : C — *■ R be continuous. Show that there 
is a continuous function g : R — ► R with g(x) — f(x) for every x e C. We say that 
g is a continuous extension of / to all of R. In particular, every continuous function 
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on the Cantor set A extends continuously to all of R. [Hint: The complement of C 
is the countable union of disjoint open intervals. Define g by “connecting the dots” 
across each of these open intervals.] 

42. Suppose that / : Q -*• R is Lipschitz. Show that / extends to a continuous 
function h : R -*■ R. Is h unique? Explain. [Hint: Given x e R, choose a sequence 
of rationals (r„) converging to x and argue that h(x) = lim„_ 0O /(r„) exists and is 
actually independent of the sequence (r„).] 


Homeomorphisms 

By now we have seen how the convergent sequences in a metric space determine all of 
its open (or closed) sets and all of its continuous functions. We have also seen how the 
open sets determine which sequences converge and which functions are continuous. 
And we have seen that the continuous functions, in turn, determine the open sets in a 
metric space and so too, indirectly, its convergent sequences. 

Any one of these three - the convergent sequences, the open sets, or the continuous 
functions - forms the “soul” of a metric space, the essence that distinguishes one metric 
space from another in “spirit,” if not in “body.” As a concrete example of this “gestalt,” 
consider Z and N. The algebraic and order properties of Z and N are surely different, but 
as metric spaces Z and N are essentially the same: countably infinite discrete spaces. 
Every subset is open, every real-valued function is continuous, and only (eventually) 
constant sequences converge. From this point of view, Z and N are indistinguishable as 
metric spaces. 

All of this suggests an idea: Two metric spaces might be considered “similar” if 
there is a “similarity” between their open sets, or their convergent sequences, or their 
continuous functions. Not necessarily “identical,” mind you, just “similar.” But how do 
we make this precise? The answer comes from examining our notion of equivalence 
for metrics. 

Suppose that we are handed two metrics, d and p, on the same set M. How do 
we compare (Af , d ) and (Af, p)l Well, consider the following list of observations (see 
Exercises 3.42 and 4.3): 

(A/, d) and (A/, p) are “similar” 

<=> d and p are equivalent metrics on M 

<=> d and p generate the same convergent sequences 

<=> d and p generate the same open (closed) sets. 

Now let’s bring continuous functions into the picture: 

d and p are equivalent metrics on M 

I d and p generate the same continuous real-valued 
functions on M 

{ d and p generate the same continuous functions 
(with any range) on M . 
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And, finally, let’s consolidate all of these observations into one: 

d and p are equivalent metrics on M 

{ The identity map i : (Af , d) -*■ (Af, p) and its inverse : 

(Af , p) — ► (Af, d) (also the identity) are both continuous. (Why?) 

Generalizing on this last observation, we say that two metric spaces (Af . d ) and 
(N.p) are homeomorphic (“similar-shape”) if there is a one-to-one and onto map 
f \ M -*■ N such that both f and / -l are continuous. Such a map / is called a 
homeomorphism from Af onto N. Note that / is a homeomorphism if and only if /“' 
is a homeomorphism (from N onto Af). You should think of homeomorphic spaces as 
essentially identical. In particular, if d and p are equivalent metrics on Af, then ( Af , d ) 
and (A/, p ) are homeomorphic. 

Theorem 5.5. Let f : (A/, d) —* (N, p) be one-to-one and onto. Then the follow- 
ing are equivalent: 

(i) / is a homeomorphism. 

(ii) 4 * f(x n ) 4 f(x). 

(iii) G is open in Af <=> /(G) is open in N. 

(iv) E is closed in Af <=> f(E) is closed in N. 

(v) d (x. y) = p{ f (*), / (y)) defines a metric on M equivalent to d. 

The proof of Theorem 5.5 is left as an exercise. The conclusion to be drawn from 
this rather long statement is that a homeomorphism provides a correspondence not just 
between the points of Af and N, but also between the convergent sequences in Af and N , 
as well as between the open and closed sets in Af and N. There is also a correspondence 
between the continuous real-valued functions on Af and N ; see Exercise 54. 

Let’s look at a few specific examples. 

Examples 5.6 

(a) Note that the relation “is homeomorphic to" is an equivalence relation. In par- 
ticular, every metric space is homeomorphic to itself (by way of the identity 
map). More generally, note that / : M -*■ N is a homeomorphism if and only if 
f~ { : Af -*■ M is a homeomorphism. 

(b) From our earlier discussion, we know that if d and p are equivalent metrics on M, 
then (A/, d) and (A/, p) are homeomorphic (under the identity map). However, 
if (Af , d) and (A/, p) are homeomorphic, it does not follow that d and p are 
equivalent; see Exercise 50. 

(c) ( R, usual ) is not homeomorphic to ( R, discrete ). Why? (Try to think of more 
than one reason.) But ( N, usual ) is homeomorphic to ( N, discrete ). Check this! 

(d) All three of the spaces ( R" , || • || i ), ( R" , || • || 2 ), and ( R" , || • || ) are homeo- 
morphic. See Exercises 3. 18 and 3.44. 

(e) Suppose that / : M -* N is an isometry from M onto N\ that is, an onto map 
satisfying p(f ( jc ), / (y)) — d(x, y) for all x, y € Af . Now an isometry is evidently 
one-to-one; hence / has an inverse that satisfies p(a,b) = d(f~ l (a),f~ l (b)) 
for all a, b e N. (Why?) That is, f~ ] is also an isometry. Clearly, then, / is a 
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homeomorphism. In this case, however, we would emphasize the fact that M and 
N are more than merely “alike” by saying that M and N are isometric. Isometric 
spaces are exact replicas of one another; they are identical in every feature save 
the “names” of their elements. For example, the interval [ 0, 1 ] is isometric to 
the interval [ 4, 5 ]; indeed, it is isometric to any closed interval of length 1 . 

(f ) In R, any two intervals that look alike are homeomorphic. [ 0, 1 ] and [ a, b ] are 
homeomorphic, as are (0, 1 ) and (a. b). The interval (0, 1 ) is also homeomorphic 
to R, and (0, 1 ] is homeomorphic to f a, b). Why? [Hint: The map x 2 - 3x 
is a homeomorphism from (0, 1 ] onto [—1,2), while x arctan x is a homeo- 
morphism from R onto (— jr/ 2, n/2).} 

(g) Any two intervals that look different are different. For example, (0, 1 ] is not 
homeomorphic to (a, b). The argument may be a bit hard to follow, so hang on! 
Suppose that (0, 1 ] is homeomorphic to ( a , b) under some homeomorphism /. 
Then, by removing 1 from (0, 1 ] and its image c — /( 1) from (a, b), we would 
have that (0, 1 ) is homeomorphic to (a, c) U (c, b). (Why should this work?) But 
(0, 1) is homeomorphic to R, and so R would be have to be homeomorphic to 
(a, c) U (c, b), too. From this it follows that R could be written as the disjoint 
union of two nontrivial open sets, which is impossible (see Exercise 4.25). The 
arguments in the various other cases are similar in spirit. 

(h) Although it will take some time before we can explain all of the details, you 
might find it comforting to know that R is not homeomorphic to R 2 and that 
the unit interval [ 0, 1 ] is not homeomorphic to the unit square [ 0, 1 ] x [ 0, 1 ]. 
More generally, if m ± n, then R" and R m are not homeomorphic. In other 
words, spaces with different “dimensions” are apparently different. 


EXERCISES 

43 . If you are not already convinced, prove that two metrics d and p on a set M are 
equivalent if and only if the identity map on M is a homeomorphism from (M, d ) to 

44 . Check that the relation “is homeomorphic to” is an equivalence relation on pairs 
of metric spaces. 

45 . Prove that N (with its usual metric) is homeomorphic to { ( 1 /n) ; n > 1 } (with 
its usual metric). 

> 46 . Show that every metric space is homeomorphic to one of finite diameter. [Hint: 
Every metric is equivalent to a bounded metric.) 

47 . Define E : N — *■ i \ by E(n) = (1 1,0,...), where the first n entries are 

1 and the rest are 0. Show that E is an isometry (into). 

> 48 . Prove that R is homeomorphic to (0, 1 ) and that (0, 1 ) is homeomorphic to 
(0, oo). Is R isometric to (0, 1 ) ? to (0, oo) ? Explain. 

49 . Let V be a normed vector space. Given a fixed vector y € V, show that the 
map f(x) = x + y (.translation by y) is an isometry on V . Given a nonzero scalar 
a € R, show that the map g(x) = ax ( dilation by a) is a homeomorphism on V. 
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50. Let (Af , d ) denote the set {0} U { ( 1 /n) : n > 1 } under its usual metric. Define 
a second metric p on Af by setting p(l/n, 1/m) = \l/n — 1/m | for m, n > 2, 
p(l/n, 1) = l/n for n > 2, p(l/n, 0) = 1 - l/n for n > 2, and p(0, 1) = 1. 
Show that (Af , d ) and (A/, p ) are homeomorphic but that the identity map from 
(Af , d ) to (Af , p) is not continuous. 

51. Let(Af, p ) be a separable metric space and assume that p(x, y) < 1 foreveryjc, 
y € Af. Given a countable dense set {*„ : n > 1 } in Af, define a map / : Af //°°, 
from Af into the Hilbert cube (Exercise 3.10), by /(*) = (p(jt, J#,))^. 

(i) Prove that / is one-to-one and continuous. In fact, / satisfies </(/(*), / (y)) < 
p(x, y), where d is the metric on H 00 . 

(ii) Fix £ > 0 and x € //°°. Find 6 > 0 such that p(jt, y) < s whenever d(f(x) % 
f(y)) < 8. Conclude that / is a homeomorphism into H°°. 


You may find the following simple lemma useful in working the subsequent batch 
of exercises. 

Lemma 5.7. Let f : L Af and g : Af N, where L t Af , am/ N are metric 
spaces. If f is continuous at x e L, and if g is continuous at f(x) € Af, then 
g o f : L — ► N is continuous at x e L. 

proof. x n x in L ==> f(x n ) -+ f(x) in Af => g(f(x n )) $(/(*)) in N. □ 


EXERCISES 

Throughout , Af denotes a generic metric space with metric d. 

> 52. Prove Theorem 5.5. 

t> 53. Suppose that we are given a point x and a sequence (x n ) in a metric space Af , 
and suppose that /( x n ) — ► f(x) for every continuous real-valued function / on Af . 
Prove that x n — ► jc in Af . 

o 54. Let / : (Af , d ) (N, p ) be one-to-one and onto. Prove that the following 

are equivalent: (i) / is a homeomorphism and (ii) g : N — ► R is continuous if 
and only if g o / : Af — ► R is continuous. [Hint: Use the characterization given in 
Theorem 5.5 (ii).] 

55. Let / : (Af , d ) — ► ( N , p ) be a homeomorphism. Prove that Af is separable if 
and only if N is separable. 

> 56. Let / : (Af , */ ) -* (7V,p). 

(i) We say that / is an open map if /(f/) is open in N whenever U is open in Af ; 
that is, / maps open sets to open sets. Give examples of a continuous map that 
is not open and an open map that is not continuous. [Hint: Please note that the 
definition depends on the target space N.] 

(ii) Similarly, / is called closed if it maps closed sets to closed sets. Give examples 
of a continuous map that is not closed and a closed map that is not continuous. 
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> 57 . Let / : (Af , d) -*■ (N, p) be one-to-one and onto. Show that the following are 
equivalent: (i) / is open; (ii) / is closed; and (iii) / -1 is continuous. Consequently, 
/ is a homeomorphism if and only if both / and f~ ] are open (closed). 

58 . Let / : ( M , d ) -*■ (IV, p ) be one-to-one and onto. Prove that / is a homeo- 
morphism if and only if / (A) = f(A) for every subset A of M. 

59 . 

(a) Show that an open, continuous map need not be closed, even if it is onto. [Hint: 
Consider the map jt(x, y) = x from R 2 onto R.] 

(b) Show that a closed, continuous map need not be open, even if it is onto. [Hint: 
Consider the map x >-+ cos x from [ 0, 2 tt ] onto [ — 1 , 1 ].] 

60. Let ( M , d) be a metric space, and let r be the discrete metric on Af . Then, 
(Af , d ) and (Af, r ) are homeomorphic if and only if every subset of Af is open in 
(A/, d) if and only if every function / : (Af, d ) -*■ R is continuous. 

61. Show that N is homeomorphic to the set {e M : n > 1 } when considered as 
a subset of any one of the spaces Co, 1 1 , ii. or [Hint: The map n i— ► e (n) is 
continuous and open. Why?] If we instead take the discrete metric on N, show that 
the map n h* e (n) is an isometry into cq. 


Perhaps you have heard the word topology ? Well, now you know something about 
it! Topology is the study of continuous transformations or, what amounts to the same 
thing, the study of open sets. This rather loose description will have to do for now. In 
any case, a property that can be characterized solely in terms of open sets is usually 
referred to as a topological property. In other words, a topological property is one 
that is preserved by homeomorphisms. For example, separability (having a countable 
dense subset) is a topological property, while boundedness is not (see Exercises 55 and 
46). And Example 5.6 (h) would seem to suggest that the “dimension” of a space is a 
topological property. The word topology is also used as the name for the collection of 
all open sets. For example, we might say that convergence and continuity in Af depend 
on the topology of Af. This description is more to the point than saying that either 
depends on the metric of Af . 

From this point on we will be very much concerned with whether or not a given 
property is preserved by homeomorphisms. Such properties are invariant under slight 
changes in the metric and so are typically more “forgiving” than those that depend 
intimately on a particular metric. 


The Space of Continuous Functions 

We write C(Af) for the collection of all continuous, real- valued functions on (Af , d ). As 
we have seen, the collection C(Af ) contains a wealth of information about the metric 
space (A/, d ) itself. This being a course in analysis (or had you forgotten?), we want 
to know everything possible about continuous functions on metric spaces. Since we 
are allowed to focus our attention on real-valued functions, C(M) is the space that we 
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need to master. We will find that C{M) comes equipped with an incredible amount of 
algebraic structure - all inherited from R. We will show that C(M) is a vector space , 
an algebra , and a lattice. One of our goals will be to find a metric (or norm) on C(M) 
that is compatible with its algebraic structure. While this will take no small effort on 
our part, it is well worth it. The scenery alone more than justifies the trip; analysis, 
algebra, and topology all flourish in C(M). 

Given real-valued functions /, g : M -*■ R, we define all of the usual algebraic 
operations on / and g “pointwise.” That is, we define c ■ /, c e R, / 4- g, and / • g by 
(c ■ f)ix) = cfix), if + g)(x) = fix) + g(x), and if • £)(*) = f(x)g(x), for all x € M. 
In this way, the ring structure of R “lifts” to the real-valued functions on M. The order 
structure of R also lifts: We define / < g to mean that / (x) < g(x) for all x e M. From 
here we can make sense out of all sorts of expressions, for example, |/|(at) = |/(jr)|, 
max{/. s)(.v) = max{/(.r). #(.r)}, and min{/, g)(x) = min{/(.r), £(.t)}. 

Now if M is a metric space, what we would like to know is whether the space C(M) 
is “closed” under all of these various operations. You won’t be surprised to learn that 
the answer is: Yes. For example, it follows from Lemma 5.7 that if / : M -*■ R is 
continuous, then so are cf, |/|, / 2 , sin(/), and so on (How?) The other cases that we 
want to consider are slightly more elaborate compositions involving two functions at 
a time, such as x t-> (f(x). g(x)) i— ► f(x) + g(x). Another easy lemma will make short 
work of the details. 

Lemma 5.8. Iff, g : M —* R are continuous, then so is the Junction h: M —> R 2 , 
defined by h(x) = (fix). g(x)) for x € M. 

proof. x n -► x in M =► f(x„) ->■ fix) and gix„) ->• £(*) in R => hix„) -> 
hix) in R 2 . (Why?) □ 


Here’s the plan of attack: Each of the functions f + g,f -g, max{/, and min{/. g) is 

the composition of two functions. First, x r-* if (jr), g(jr)), and then the pair (/ (at), gi x )) 
in R 2 is mapped to fix) + gix), or fix)gix). or max{/(x). j?(j:)}, or min{/(x), £(*)}. 
If / and g are continuous, then the first map is always continuous by Lemma 5.8, and 
so we only need to know whether the second map is continuous from R 2 into R in 
each of the four cases. Here are some of the details (you may want to supply a few 
more). 

Examples 5.9 

(a) The map (a:, y) i-+ x 4- y is continuous: If x„ -*■ x and y„ -*■ y in R, then 
•*« + >»->* + >’ because |(a:„ + y n ) - ix + y)\ < |Ar„ - a: | + |y„ - y \ . Alternatively, 
you might show that the set {(at, y) : |(a: + y) - (a -I- />)| < e) is open in R 2 . 

(b) The map (at , y) max{Ar, y) is continuous; an easy way to see this is to write 
max{Ar, y} = 5U + y + |x — y|). (How does this help?) For (a:, y) i-> min{Ar, y}, 
use the fact that min{.r. y} = |(a + y — |at — y|). 

(c) The map (a:, y) i-> xy is continuous since xy = J [(a: + y) 2 - (x — y) 2 ]. (How 
would a “direct” proof go?) 
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Combining these observations with Lemma 5.8 gives: 

Theorem 5.10. Let f, g : M — * R he continuous. Then, f ± g. f ■ g, max{/. g}, 

and min{/, g} are all continuous. 

If we use the pointwise definitions for algebraic operations in C(M), then C(M) 
becomes a vector space (it is closed under addition and scalar multiplication), an algebra 
(or ring - it is also closed under products), and a lattice (each pair of functions has a 
max and a min back in C(M)). The most important observation for now is that C(M) is 
a vector space; we will have much more to say about the lattice and ring structures later. 

Our next task is to determine, if possible, a metric or a norm on C(M) that will 
be compatible with these algebraic operations. We have been given a hint as to how 
to do this by Frechet himself. The norm of choice on C[a,b] is apparently ||/||oo = 
max 0 <,<6 1/ (/)|. We have already checked that this is, in fact, a norm on C\a,b] (that 
is, it “respects” the vector space operations in C[ a, b ] ). That this norm does still more 
is outlined in the following exercise. 


EXERCISE 

62. If /,g € C[a,b ], show that ||/g||oc < ll/ll*. Ilgll*. Alsoshowthat || max{/, 
g}||oo < max {H/lloc, llglloc}. and that \\f \\ K < HglU whenever |/| < |g|. 


We know that homeomorphic spaces are supposed to have (essentially) the same 
collection of continuous functions. Let’s make this even more precise in at least one 
special case. 


EXERCISE 

63. Let [ a, b ] be any closed, bounded interval in R, and let a : [ 0, 1 ] — > [ a, b ] 
be defined by o(t) = a + t(b — a). Prove that: 

(i) a is a homeomorphism. 

(ii) / e C[ a, b ] if (and only if) / o a e Cf 0. 1 ]. 

(iil) The map / i— ► / o ct is an isometry from C[a,b J onto C[ 0, 1 ]. The map 
T(f) = f o a actually docs much more; it is both an algebra and a lat- 
tice isomorphism. That is, it also preserves the algebraic and order structures. 
Specifically, given any /, g € C[a,b J, check that: 

(iv) T(af + /3g) = aT(f) + 0T(g) for all or, fi € R. 

(v) T(fg) = T(f)T(g). 

(vi) T(f) < T{g) if and only if f < g. 

Thus, for all practical purposes, C[ a, b ] and C| 0. 1 ] are identical. 


But will the norm onC[a,i] give any clues to a possible norm on C(R)? Since the 
elements of C(R) need not be bounded (let alone actually attain a maximum value), we 
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cannot expect to use sup, €R |/(f)|, for example. A norm may be too much to hope for, 
but it is easy enough to define a metric on C(R). This, too, comes to us from Fr6chet. 


EXERCISE 

64. Given n € Nand/.g e C(R), let </„(/, g) = maX|,|< n 1/(0 -g(/)l- Then 
defines a pseudometric on C(R). (Why?) Show that d(f, g ) = 2 g)/ 

( 1 + d„(f, g)) defines a metric on C(R). 


It will take quite a bit more work before we can settle the issue of a reasonable metric 
on C(M) - even in a few special cases. But at least one case is easy to describe. If M is 

a finite set, say M = {jci x„] (under any metric), then we may identify C(M) with 

R" by identifying each / 6 C(M) with its range (f(x\) /(*„)) g R". Why does 

this work? Recall that any metric on a finite set is necessarily equivalent to the discrete 
metric, and so every function / : M -*■ R is continuous. Thus, each y e R" defines an 
element / e C(M) by setting f(x k ) = y k , for k = 1 n. 

If we use coordinatewise operations on R", this correspondence even preserves 
the algebraic structure on C(M). For example, check that if /, g e C(M), then / + g 

corresponds to the vector (f(x \ ) + g(xi ) f(x„) + g(x„)). Similarly, / g corresponds 

to (fix i )g(*i ) f(x„)g(x n )) and |/| corresponds to(|/(*i )| |/(r„)|). Finally, we 

can induce a suitable norm on C(M) by taking the “max” norm on R". Specifically, 
check that || /Hoc = maxi<i<„ |/(jt,)|, the norm induced on C(M) by this correspondence, 
satisfies ||/g||oo < ll/lloo llglloo and ||/||oo < llglloo whenever |/| < |g|. 

Our goal is now a little clearer: To define a norm on C(M), we want M to be “like” 
a finite set. Whatever that might mean, we would certainly hope that [a, b] turns out 
to be “like” a finite set (after all, that case works just fine already). We will put these 
issues aside for now, but they will resurface in Chapter Eight when we finally arrive 
at a plausible generalization of finite sets (which really will include [ a, b ] as a special 
case). 


Notes and Remarks 

The so-called Lipschitz condition of Exercises 19 and 25 was introduced by Rudolph 
Lipschitz in 1876 (for more on this, see the discussion in Chapter Seven). 

The definition of continuity in terms of open sets is due to Hausdorff. For various 
notions of “almost” continuous and “nearly” continuous functions, based on variations 
of Exercise 30, see Beslin [1992] and Tong [1992]. 

Semicontinuity (Exercises 32 and 33) was introduced by Ren6 Baire in his thesis, 
Baire [1899]. Also see Rad6 [1942]. Related to Exercises 7 and 32 is the intoxicating 
article by Foster Brooks [1971], where sets of the form {* : f(x) > a) are called “cut 
sets.” 

For more on the notion of “dimension,” which was referred to in passing in Exam- 
ple 5.6 (h), see Menger [1943]. 
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The algebraic and lattice structures of the space C(M) have been the topic of a 
great deal of research during the last 50 years. For much more on this, see Birkhoff 
[1948), Gillman and Jerison [1960], Goffman and Pedrick [1965], Jameson [1974], 
Kuller [1969], Schaefer [1980], and Simmons [1963]. The short note by Aron and 
Fricke [1986] provides an elementary proof of the fact that a linear, multiplicative map 
<p : C(R) -»• R (i.e., an algebra homomorphism) is given by point evaluation. Compare 
this with Exercise 63. 
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Connected Sets 

We have a few details to clean up before we move on to other things; these concern the 
special role of intervals in R and their use in characterizing the open sets in R given in 
Chapter Four (see Theorem 4.6 and Exercise 4.25). As we’ll see in this section, a better 
understanding of the special nature of intervals in R will allow us to generalize the 
intermediate value theorem of calculus. The intermediate value theorem is the formal 
statement of the informal notion that the graph of a continuous function is “unbroken." 
The historical basis of the theorem is the concept of a function as measuring, over time, 
the relative position of an object moving along a straight line. Thus, if we track the 
position y = f(x) of a moving object between time x = a and some subsequent time 
x = b, we would expect the object to “visit” all of the positions y that are intermediate 
to f(a) and f(b). In short, the continuous image of the time interval [a,b\ should 
contain (at least) the full interval of positions between /(a) and f(b). 

The secret here is the intuitively obvious fact that no interval in R can be split into 
two relatively open parts. Let’s prove this by “brute force” for the interval [ a, b ] (we’ll 
do the other cases shortly). 

Suppose to the contrary that [a,/>) = AUfi, where A and B are nonempty, disjoint, 
relatively open sets in [ a. b ]. We are going to find a contradiction by examining the 
“border” between A and B. The trouble comes from the fact that A and B are necessarily 
also closed in [ a. b ], since each is the complement of an open set: A = [ a, b ] \ B and 
B = [ a, b 1 \ A, and so each of A and B lays claim to the “border.” 

To get started, we might as well assume that b e B, and so (b - e, b ] c B, for some 
e > 0, since B is open. Now let c = sup A. Clearly, a < c < b, but note that, since A 
and B are open in ( a, b 1, we actually have a < c < b. (Why?) Next, it follows from 
the definition of c that (c — e, c) n A yt 0 and (c,c + e)n B y£ 0 for any e > 0; in 
fact, (c, b ] c B. That is, c € A and c € B. But then, c € A n B = A C\ B = 0. This 
contradiction shows that no such splitting of [ a, b ] into nonempty, disjoint, open sets 
is available. 

Based on this observation, we say that a metric space M is disconnected (or not 
connected) if M can be split into the union of two nontrivial open sets, that is, if 
there are nonempty open sets A and B in M with A D B = 0 and A U B = M. 
The pair of open sets A and B is called a disconnection of M. We say that M is 
connected if no such disconnection can be found. Thus, for example, [ a,b ] is con- 
nected. 
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Notice that we could just as well have used closed sets in our definition. If a discon- 
nection A, B exists, then the disconnecting sets are also closed: A = B c and B = A c . 
That is, A and B are clopen (simultaneously open and closed) sets. Conversely, if 
M contains a nontrivial clopen subset A (other than 0 or M), then A and A c are a 
disconnection for M. This gives us our first theorem: 

Theorem 6.1. M is connected if and only if M contains no nontrivial clopen sets. 

Examples 6.2 

(a) R is connected. (This follows from Exercise 4.25, but we will give another proof 
shortly based on the fact that [ a, b ] is connected.) 

(b) A discrete space containing two or more points is disconnected. 

(c) The empty set 0 and any one-point space are connected (by default). 

(d) The Cantor set A is (very!) disconnected. Indeed, it follows from Exercise 2.22 
that for any x, y e A with x < y there is a z £ A such that x < z < y. Thus, A 
is disconnected by the (relatively) open sets A = [0, z)n A and B = (z, 1 ] fl A. 

Our terminology for connectedness is unavoidably fussy. After all, we have defined 
connectedness in terms of what it is not, namely, disconnected. To make matters worse, 
at least on the surface. Example 6.2 (d) and our proof that [a, b] is connected both 
suggest the frightening prospect of “relatively connected” as an altogether separate 
notion. Well, fear not! Connectedness is not a relative property for metric spaces. To 
see why, we will need to face the relative definition head-on. 

A subset E of a metric space M is disconnected in E if there exist disjoint, nonempty, 
open (in E) sets V and V such that E = U U V. Now, it is immediate that this gives 
us a pair of open sets A and B in M such that U = All E and V = B n E. And so 
“unrelating” the relative definition, by writing it in terms of A and B , yields: ADE ^ 0, 
B n E ^ 0 , (A n E) n (B n E) = 0, and E = (A n E) U (fl n £), or E C A U B. 
(Phew!) This mess would be greatly simplified if we could take A and B to be disjoint 
in M. While this need not hold true in more general settings, luck is with us in a metric 
space. 

Lemma 63. Let E be a subset of a metric space M. If U and V are disjoint open 
sets in E, then there are disjoint open sets A and B in M such that U = A C E 
and V = B fl £. 

proof. We will only sketch the proof, leaving the full details as an exercise. The 
hard work here is largely a matter of notational bookkeeping. To spare us much 
of this notation, let’s avoid the relative metric wherever possible. We will state 
everything in terms of open balls in M, using the simple notation B e (x) in place 
of the more cumbersome B e M (x). 

For each x e U there is an e x >0 such that £ n B tx (jc) c U, because U is open 
in £. Likewise, for each y € V there is a <5 V > 0 such that £ n Bsfy) c V. Since 
U n V = 0, we also get £ n B fx (x) n B ix (y) = 0. We would like to get rid of the 
set £ in this conclusion, and we can do so at a small price: 
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Claim. B c j 2 (x) n Biji(y) = 0 for every x e U and y e V. (Just check.) 

Thus A = \J{B c ,/ 2 (x) : x e U) and B = : y e v ) W01 *- n 

The conclusion to be drawn from Lemma 6.3 is that E is disconnected (in E) if and 
only if there exist disjoint, nonempty, open sets A and B in M such that A n E # 0, 
B n E ^ 0, and E c A U B. And it does not matter whether we take “open” to mean 
“open in E" or “open in M." That is, this statement reduces to the original definition 
in case E = M, and it gives the correct “relative” definition in any case (by taking 
U = AO E and V = BC\E). Thus, there is no harm in simply taking it as our new 
definition of a disconnected set , as opposed to a disconnected space. In other words, 
we have dodged a bullet! By adopting this harmless rewording of the definition of 
disconnected, and hence also a rewording of the definition of connected, we have freed 
the concept from any apparent dependence on the relative metric. We would be foolish 
to do otherwise. 

Henceforth, when considering a subset £ of a given metric space M, we will call a 
pair of disjoint open sets A and B a disconnection of E if A n E ^ 0, B n E / 0, 
and E c A U B. And, of course, we will say that £ is a connected set if no such 
disconnection of £ can be found. 

Let’s put this new definition to use by giving another characterization of the intervals 
in R. 

Theorem 6.4. A subset E of R, containing more than one point, is connected if 
and only if, whenever x, y e E with x < y, we also have [x, y ] c £. That is, 
the connected subsets of R ( containing more than one point) are precisely the 
intervals. 

proof. One direction is easy: If there exist points x < z < y such that x,y e £ 
but z * £, then £ c (-oo, z) U (z, +oo); that is, A = (— oo, z) and B — (z, +oo) 
is a disconnection of £. 

For the other direction, suppose that £ satisfies the condition that ( x, y ] c £ 
whenever x, y e E with x < y, but that £ is disconnected. Then there are disjoint 
open sets A and B in R such that An E ^ 0, B n E ^ 0, and £ c A U B. 
Given points a e A n £ and be B n £, we might as well assume that a < b 
and hence that ( a,b ] c £. But now [a,b] c £ C A U B\ that is, A and B 
are a disconnection of the interval [ a, b ]. This contradicts the fact that ( a, b ] is 
connected. Hence £ is connected. 

Finally, suppose that £ satisfies [jt, y ] c £ whenever x,y e E with x < y. 

We want to prove that £ is an interval. But it follows from this condition that 
£ contains the open interval (inf£, sup£), where we include the possibilities 
inf £ = -oo and sup £ = +oo. (Why?) Thus, £ must be an interval; which 
particular type of interval depends on the disposition of inf £ and sup £ as finite, 
or not, and as elements, or not, of £. □ 

We can now shed some light on the structure of open sets in R. The proof of Theo- 
rem 4.6 shows that each nonempty open set U in R can be uniquely written as the union 
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of connected subsets. Indeed, we wrote an open set in terms of “maximal” intervals /*, 
and such intervals are actually maximal with respect to being connected subsets of U 
(i.e., no larger subset of U will be connected). At each x € U y we took I x to be the union 
of all of the open subintervals in U that contain x . Thus, I x is both open and connected 
(see Exercise 6), and hence it is an open interval. The remainder of the proof of Theo- 
rem 4.6 shows that two such connected “components” of U are either identical or dis- 
joint. There are at most countably many distinct /*, the union of which must be all of U. 

Given any set £, we call the maximal (with respect to containment) connected 
subsets of £ the connected components of E. Essentially the same line of reasoning 
as above shows that every set can be written (uniquely) as the disjoint union of 
its connected components. A connected set, then, is a set with only one connected 
component (namely, itself). 


EXERCISES 

Except where noted , each of the following exercises is set in a generic metric space 
M with metric d. 

1. Supply the missing details in the proof of Lemma 6.3. 

2. Show that the only nonempty connected subsets of A are singletons. (We would 
say that A is totally disconnected.) 

3. If £ is a connected subset of M , and if A and B are disjoint open sets in M with 
£ C A U B y prove that either £ C A or £ C B. 

4 . Prove that £ is disconnected if and only if there exist nonempty sets A and B in 
M satisfying A n £ = 0, fi fl A = 0, and £ = A U B. 

>5. If £ and F are connected subsets of M with £ D F ^ 0, show that £ U F is 
connected. 

> 6. More generally, if C is a collection of connected subsets of M , all having a point 
in common, prove that (J C is connected. Use this to give another proof that R is 
connected. 

> 7. If every pair of points in M is contained in some connected set, show that M is 
itself connected. 

8. If £ and F are nonempty subsets of M , and if £ U F is connected, show that 
EOF ^ 0. 


We are more than ready to speak of continuous functions and connectedness. Our 
first result shows that the two-point discrete space is the canonical disconnected set. 

Lemma 6.5. M is disconnected if and only if there exists a continuous map from 
M onto {0, 1} (the two-point discrete space). 

proof. If /: A/ -►{(), 1} is onto, then A = / _, ({0}) and B = /~ I ({1}) are 
disjoint, nonempty, and satisfy A U B = M. If / is also continuous, then A and 
B are clopen sets and so form a disconnection of M . 
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Conversely, if A and B are a disconnection of Af, then setting f(a) = 0 for 
a € A and f(b) = 1 for b € 5 defines a continuous map / from Af onto {0, 1}. 
(Why?) □ 

Lemma 6.5 is telling us that there is no continuous method of splitting a connected 
set Af into two discrete “parcels.” More generally, it follows that Af is connected if and 
only if any continuous map from Af into a discrete space is necessarily constant. 

Lemma 6.5 gives a nearly perfect replacement for the definition of disconnected. 
All of the notational difficulties that we faced earlier are now hidden in subtleties of 
language. For example, we have traded the cumbersome notation of relatively open 
sets for the tacit understanding that continuity may mean relative continuity. Most 
convenient. All of this hard work is beginning to pay off! In fact, we can now give a very 
short proof of that generalized intermediate value theorem we have been looking for: 

Theorem 6.6. Let f : (Af , d ) — ► ( N , p) be continuous , and let E be a subset of 
M. If E is connected , then /(£) is connected. 

proof. Suppose that / (£) is not connected. Then there exists a continuous, onto 
map g : /(£) {0, 1}. But this means that g o / : E -► {0, 1} is continuous and 

onto. That is, E is not connected. □ 

To see that Theorem 6.6 is a generalization of the intermediate value theorem, we just 
need to bring Theorem 6.4 back into the picture: The connected subsets of R (containing 
more than one point) are precisely the intervals. Thus, the image of an interval under a 
nonconstant continuous function is again an interval. 

Corollary 6.7. If I is an interval in R and iff : I — ► R is a nonconstant continuous 
function, then /(/) is an interval. In particular, if a, b e I with f(a )^ f(b), then 
f assumes every value between f(a) and f (b). 


EXERCISES 

Throughout, Af denotes an arbitrary metric space with metric d. 

>9. If A C B C A C Af , and if A is connected, show that B is connected. In 
particular, A is connected. 

10. True or false? If A C B C C C A/, where A and C are connected, then B is 
connected. 

11. An alternate definition of connectedness for metric spaces can be phrased in 
terms of continuous real-valued functions: Prove that Af is disconnected if and only 
if there is a continuous function / : Af — > R such that /“ ! ({0}) = 0 while 
/ _, ((— oo, 0)) ^ 0 and / _l ((0, oo)) ^ 0 . [Hint: If A and £ are a disconnection, 
consider /( jc) = d(x. A) — d(x , £).] 

12. If M is connected and has at least two points, show that Af is uncountable. 
[Hint: Find a nonconstant, continuous, real- valued function on Af.] 
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>13. If / : [ a , ] — ► [ a, ] is continuous, show that / has a fixed point; that is, 
show that there is some point x in [ a, b ] with /(jc) = x. 

14. Let / : [ 0, 2 ] — ► R be continuous with / (0) = /( 2). Show that there is some 
jc in [ 0, 1 ] such that /(jc) = f(x + 1 ). 

15. If / : R R is continuous and open, show that / is strictly monotone. 

16. If / : R — ► R is continuous and one-to-one, show that / is strictly monotone. 

17. Prove that there does not exist a continuous function / : R -* R satisfying 
/(Q) C R \ Q and /(R \Q)cQ. 

18. Let A and B be closed subsets of M , and suppose that both A U B and A fl B 
are connected. Prove that A and B are connected. 

19. Let / = (R \ Q) H [ 0, 1 ] and Q = Q fl [ 0, 1 ], with their usual metrics. 

Prove that there is a continuous map from / onto Q y but that there does not exist a 
continuous map from [ 0, 1 ] onto Q. [Hint: Given a sequence of rationals 0 = ro < 
r\ < • • • < r n < 1 increasing to 1 , notice that / can be written as the disjoint union 
of the open sets (r„_j , r n ) fl [ 0, 1 ], n = 1,2 ] 

20. Let / : [a, b] -> R be continuous, and suppose that / takes on no value 
more than twice. Show that / takes on some value exactly once. [Hint: Either the max- 
imum or the minimum value occurs only once.] Consequently, / is piecewise mono- 
tone. 

21. Suppose that / : R -> R takes on each of its values exactly twice; that is, 
for each ye R, the set [jc : y = /(x)} has either 0 or 2 elements. Show that / is 
discontinuous at infinitely many points. 

22. Suppose that / : R — ► R has the intermediate value property ; that is, suppose 
that if jc < y with /(jc) ^ f(y ), then / assumes every value intermediate to /(jc) 
and f(y) on the interval (jc, y). If, in addition, we assume that the graph of / is 
closed in R 2 , prove that / is continuous. [Hint: If / is discontinuous at b y then there 
is a sequence b n -* b such that \f(b n ) - f(b)\ > e for some e > 0 and all n. By 
passing to a subsequence, we may suppose that, say, f(b n ) > f(b) + e for all n. 
How does this help?] 

23. If / : R — ► R is differentiable, prove that /' has the intermediate value 
property. Specifically, if a < b and / \a) < m < f '( b) y show that / '(c) = m for 
some c 6 (a, b ). [Hint: Consider g(x) = /(jc) — mjc.] 


Although it follows easily from the definition (since it is given in terms of open 
sets), Theorem 6.6 also shows that connectedness is preserved by homeomorphisms. 
This observation allows us to clarify one of the harder examples from Chapter 
Five. 


Example 6.8 

Intervals that “look” different are different. Specifically, no pair of intervals from 
among [a, b ], (a, b ], and (a, b) can be homeomorphic. The reasons we gave in 
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Chapter Five can now be restated in terms of our new language. To see that (a, b ] 
and (a, b) are not homeomorphic, for example, suppose that / : (a,b] -* (a, b) is 
one-to-one and onto, and let c = f(b). (Hence, a < c < b.) Then, the restriction 
of / to (a, b ] \ {£} = (a,b) is still one-to-one, but now its range is the disconnected 
set (a, b) \ {c} = (a. c) U (c, b). Since / maps a connected set onto a disconnected 
set, / cannot be continuous. The other cases are similar. 

The key observation in Example 6.8 is that between two “different” intervals one 
can always afford to lose more points than the other before becoming disconnected. For 
example, [a, b ] can afford to give up two points and still remain connected, whereas 
(a, b 1 only has one point to spare. We could stretch this same reasoning to show that the 
unit interval [ 0, 1 ] is not homeomorphic to the unit square [ 0, 1 ] x [ 0, 1 ], for example. 
For this, we first need a lemma: 

Lemma 6.9. If A and B are connected, then A x B is connected. 

proof. Suppose that / : A x B -*■ {0, 1 } is continuous. We need to show that / 
is constant. But, given any a e A and b' e B, each of the functions /(a, • ) : B -*■ 

{0, 1 } and /( • , b ' ) : A -*■ (0, 1 } is continuous. (This follows from what we know 
about “the” product metric; see Exercise 3.46.) Consequently, since A and B are 
connected, each of these new maps must be constant. 

This means that / is constant along “horizontal” and “vertical” lines in A x B. 
Thus, f{a, b) = f(a\ b' ) because /(a, • ) and /( • , b' ) are constant and the two 
functions must agree at ( a , b' ). (Figure 6.1 may help; / is constant along each 
dotted line, and these values must agree at the “intersections”) That is. / is 
constant, n 


B 


b' 


b 


a 


A 


Thus 1 0, 1 ] x f 0, 1 ] is connected, and now it is easy to see why [ 0, 1 ) x [ 0, 1 ] cannot 
be homeomorphic to [ 0, 1 ]. Indeed, [0, 1 ] \ { 1/2} is disconnected while [ 0, 1 ] x [ 0, 1 ] 
minus any point is still connected. (Why?) Similarly, R 2 is connected, and essentially 
the same argument shows that R 2 is not homeomorphic to R. By induction, R n is 
connected (we will outline a second proof in the exercises), and this line of reasoning 
can be used to show that R n is not homeomorphic to R for n > 1 . But the question of 
whether R" is homeomorphic to R m for arbitrary n ^ m is very difficult! Nevertheless, 
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the argument is the same in spirit (the “bigger” space is “more” connected), and it is in 
fact true that R" is not homeomorphic to R m for n ^ m. 


EXERCISES 

24 . Show that (0, 1) x (0, 1), although an open set in R 2 , cannot be written as a 
disjoint union of open balls in R 2 . (Compare with Theorem 4.6.) 

25 . 

(a) Give an example of a continuous function having a connected range but a dis- 
connected domain. 

(b) Let D C R, and let / : D -*■ R be continuous. Prove that D is connected if 
{(jc, / (jt)) : x € D). the graph of /, is a connected subset of R 2 . 

26 . Let / : [ 0, 1 ] — ► R be defined by f(x) = sin( 1 /x) for x ^ 0 and /( 0) = 0. 
Show that although / is not continuous, the graph of / is a connected subset of R 2 . 
[Hint: Use Exercise 9.] 

27. Let V be a normed vector space, and let x ^ y € V . Show that the map 
f(t) = jc -I - t(y — x) is a homeomorphism from [0, 1 ] into V. The range of / 
is the line segment joining x and >\ and it is often written [jc, > ] (since / is a 
homeomorphism, this interval notation is justified). [Hint: That / is continuous and 
one-to-one is easy; next show that if f(t„) — ► z, then (t„) converges to some t in 
[0, 1] with z = /(/).] 

28 . Deduce from Exercises 7 and 27 that any normed space V is connected. 


The full details will have to wait for a while, but we have enough “savvy” at this 
point to discuss an extremely curious and highly counterintuitive phenomenon. In spite 
of the fact that [ 0, 1 ] and [ 0, 1 ] x [ 0, 1 ] are not homeomorphic, and in spite of the 
fact that the square [ 0, 1 ] x [ 0, 1 ] should, by rights, be much “bigger” than the interval 
[ 0, 1 ], there exists a continuous onto map / : [ 0, 1 ] — ► [0, 1 ] x [ 0, I ). (As we will see 
in Chapter Eight, no such map can be one-to-one. In fact, no continuous, one-to-one 
map from [ 0, 1 [ to [ 0, 1 ) x [ 0, 1 ) can have a dense range.) 

Now a map f(t) = (*(/), y(f)) from [0, I ] to [0, 1 ] x [0, 1 ] is called a path, or 
curve. If the range of / “fills” the square, we say that / is a space-filling curve. The 
existence of any space-filling curve was considered quite shocking at one time, let 
alone a continuous space-filling curve! But, as is typical of such discoveries, once a 
continuous space-filling curve was shown to exist, dozens of other examples followed. 
We will briefly describe two such examples. 

The first example is due to Peano in 1 890. The idea is to define a sequence of paths 
that visit ever more points in the square; the “limit” path will be onto since it ultimately 
visits a dense set of points in the square (more on this in Chapter Eight). Figure 6.2 
shows the first two paths. 

Figure 6.2 shows the unit square broken into nine equal subsquares; the first path 
travels from (0, 0) to (1, 1) (i.e., from lower left to upper right) in a series of straight 
line paths, in the direction indicated by the circled numbers. 
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Figure 6.3 shows each of the subsquares of Figure 6.2 broken into 9 equal subsquares, 
giving us 8 1 subsquares in all. The second path travels from (0, 0) to ( 1 , 1 ) by repeating 
the first path in “miniature” in each 3x3 block of subsquares. The new path traverses 
each of the nine original subsquares in the same order as before (the path wends its way 
up the first column of 3 x 3 blocks, down the center column, and up the last column). 
Notice, too, that the direction of each of the nine “miniature” paths is determined by 
the direction of the corresponding segment of the first path. That is, we enter the first 
3x3 block at the lower left and exit at the upper right; we enter the second 3x3 block 
at the lower right and exit at the upper left; we enter the third block at the lower left 
and exit at the upper right, and so on. 

The third path is obtained by repeating this process in each of the 81 subsquares of 
Figure 6.3. That is, divide each subsquare into 9 more equal subsquares, giving us 729 
in all, and repeat the first path in “microminiature” in each of the new 3x3 blocks. 
Continue. The limit of this process (which can be made rigorous) is a continuous path 
mapping [ 0, 1 ] onto the square [ 0, 1 ] x [ 0, 1 ]. 
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Our second example of a space-filling curve is essentially due to Lebesgue in 1928. 
This one is even more amazing than Peano’s example (if such a thing is possible). 
Lebesgue’s idea is this: Since the Cantor function maps A (a tiny set) onto all of [ 0. 1 ] 
(a big set), perhaps some variation on the Cantor function will map A onto the square 
[0, 1 ] x [0, 1 ] (an even bigger set). And it does!! Incroyable! 

Here’s the setup: Recall that each element t € A can be written as t = 2a n /3 n , 

where each a„ is either 0 or 1; in other symbols, t = 0 .( 2 a\)( 2 a 2 )( 2 ai) . . . (base 3). 
Now define a map n-> (ac(r), y(t)) by Jt(/) = 0 .a 2 a 4 a 6 .. . (base 2) and y(t) = 
OMiOjas . . . (base 2). Each of x(r) and >•(/) is rather like the Cantor function; each 
is continuous on A, and each maps A onto [ 0, 1 ]. (Why?) Moreover, jc(r) and y(t) 
extend to continuous functions on [0, 1 ], and the path fit) = (jc(/), y(t)) is actually 
a continuous space-filling curve (which maps A onto [ 0, 1 ] x ( 0, 1 ]). Amazing! And 
now that we know the “trick,” we can play this same game again to get a continuous 
map from A onto [0, 1 ] x [0, 1 ] x [0, 1 ]. Just take each element of A, written as a 
ternary decimal, and “spread out” the ternary decimal to make up three binary decimals, 
this time using every third ternary digit: O.a^aj .... and so on. By induction, [ 0, 1 ]" 
is the continuous image of A for every n > 1 . Unbelievable! What was counterintuitive 
and simply out of the question moments ago has reduced to “one small step” after the 
fact. (And it gets even better! But we will save that story for another day.) 


Notes and Remarks 

For complete details of the proof that R" and R m are not homeomorphic for n / m, see 
M.H. A. Newman [1951]. 

For a thorough discussion of topics related to the intermediate value theorem (Corol- 
lary 6.7), including the intermediate value property for derivatives (Exercise 23), see 
Boas [I960]. 

The brand of connectedness found in Exercise 28 is called pathwise connectedness 
(or, to be precise, arcwise connectedness). A space is pathwise connected if there is a 
path (a continuous map on [ 0, 1 ] ) joining any pair of points in the space. Exercise 7 and 
Theorem 6.4 show that pathwise connected spaces are connected in our sense (but not 
conversely - in the example given in Exercise 26, the point (0, 0) cannot be connected 
to the rest of the graph by means of a path). Pathwise connectedness is older than 
connectedness; according to Willard [1970], Weierstrass used it as early as the 1880s. 
The modem version evolved through the efforts of several mathematicians, including 
Cantor, Jordan, Schoenflies, Lennes, Riesz, and Hausdorff. For a more complete history, 
see Wilder [1978, 1980). 

For functions / : R -► R, continuity, the intermediate value property, and the con- 
nectedness of the graph of / (as a subset of R 2 ) are essentially equivalent. For much 
more on this, see Burgess [1990]. Exercise 22 is based on the discussion in Burgess’s 
paper, but see also Boas [1960] and Randolph [1968]. 

Lebesgue’s simplification of Peano’s space- filling curve appears in his book, Legons 
sur I’ Integration (Lebesgue [1928]), which was originally published as one of the vol- 
umes in Borel’s series of monographs. Lebesgue’s example was subsequently modified 
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by I. Schoenberg in Schoenberg [1938]. For further details, see Schoenberg [1982] and 
Sagan [1986, 1992]. We will have more to say about the Schoenberg-Lebesgue curve 
later in the book. 

Space-filling curves have been a constant source of fascination in the mathematical 
literature. New examples and simplifications of old examples continue to surface in 
popular journals; dozens of articles on space-filling curves have appeared in the Monthly 
over the years. Two such articles, one old and one new, are Moore [1900] and Holbrook 
[1991] (but see also Swift [1961], Wen [1983], and Lance and Thomas [1991]). Moore’s 
paper is particularly interesting; he discusses Hilbert’s example of a space-filling curve, 
Weierstrass’s nondifferentiable function, and other early work. Holbrook, on the other 
hand, takes a novel approach: He shows that a curve fit) = (x(t), y{t)) is space-filling 
whenever the coordinate functions x(t) and y(t) are stochastically independent. For 
a discussion of space-filling curves in general, see Boas [1960] and the articles by 
Whybum [1942] and Hahn [1956b]. For a thorough treatment of related constructions, 
see A. N. Singh [1969]. 
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Totally Bounded Sets 

At the end of Chapter Five we discussed the problem of defining a norm on C(M), the 
space of continuous, real-valued functions on a metric space M. We saw that an easy 
solution presents itself in the case where M is finite, and the suggestion was made that 
it is enough for M to be “like” a finite set. In this chapter we will come one step closer 
to making this vague suggestion precise. To begin, we consider sets that can be written 
as the union of finitely many small “parcels.” 

A set A in a metric space (M. d ) is said to be totally bounded if, given any e > 0, there 

exist finitely many points jcj x„ e M such that A c U" =I £*(*<)• That is, each x 6 A 

is within e of some x, . For this reason, some authors would say that the set (.t| ,v n ) is 

t -dense in A, or that (jci x „ } is an £-net for A . For our purposes, we will paraphrase 

the statement A c U"=i # E (x,) by saying that A is covered by finitely many e-balls. 

In the definition of a totally bounded set A, we could easily insist that each e- 

ball be centered at a point of A. Indeed, given e > 0, choose X| x„ € M so that 

A c U/=i ®f/ 2 (*i)- We may certainly assume that A n B e /z(xi) ^ 0 for each /, and so 
we may choose a point y, e A n /:(*;) for each i. By the triangle inequality, we then 
have A c (J?=i #*(>’■)• (Why?) That is, A can be covered by finitely many e-balls, each 
centered at a point in A. More to the point, a set A is totally bounded if and only if A 
can be covered by finitely many arbitrary sets of diameter at most e, for any e > 0. 

Lemma 7.1. A is totally bounded if and only if, given e > 0, there are finitely 
many sets A\ A„ C A, with diam(A, ) < e for all i, such that A c |J"=i A,-. 

proof. First suppose that A is totally bounded. Given e > 0, we may choose 
jti, . . . , x„ e M such that A c (J?_, £<■(•*/). As above, A is then covered by the 
sets A, = A n Bfxj) c A and diam(A, ) < 2e for each i. 

Conversely, given e > 0, suppose that there are finitely many sets A i , . . . , A„ c 
A, with diam(Aj) < e for all i, such that A c (J"_, A,. Given x, € A;, we then 
have A, c for each /' and, hence, A c (J"=i #:,(*,). 

Since e is arbitrary in either case, we are done. □ 

Notice that the condition in Lemma 7.1 demands that A\ A„ be subsets of A. 

This is no real constraint since, after all, if A is covered by B\ B„ c M, then A is 

also covered by the sets A, = A n B, c A and diam(A,) < diam(B,). 
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Examples 7.2 

(a) By the triangle inequality, a totally bounded set is necessarily bounded. (Why?) 
Note also that any subset of a totally bounded set is again totally bounded. (See 
Exercise 1.) 

(b) A finite set is always totally bounded. In a discrete space, a set is totally bounded 
if and only if it is finite. (Why?) 

(c) In R, we do not get anything new: A subset of R is totally bounded if and only if 
it is bounded. (See Exercise 2.) Thus, total boundedness is apparently not a topo- 
logical property; it depends intimately on the metric at hand. (See Exercise 3.) 

(d) In general, not every bounded set is totally bounded. The discrete metric 
gives us a clue as to how we might construct such a set. Recall the sequence 

e (n) = (0 0, 1,0,...) in t u where the single nonzero entry is in the nth 

place. Then, { e (n) : n > 1} is a bounded set in l\, since ||e (B) ||i = 1 for all n, 
but not totally bounded. Why? Because ||e (m) - ^ <n) ||i = 2 for m # n; thus, 
{e (n) : n > 1 } cannot be covered by finitely many balls of radius <2. In fact, the 
set {e (n) : n > 1 } is discrete in its relative metric. (Compare with Exercise 8.) 


EXERCISES 

Except where noted, each of the following exercises is set in an arbitrary metric space 
M with metric d. 

> 1. If A C B C M, and if B is totally bounded, show that A is totally bounded. 

> 2. Show that a subset A of R is totally bounded if and only if it is bounded. In 
particular, if / is a closed, bounded, interval in R and e > 0, show that / can be 
covered by finitely many closed subintervals J\ , . . . , J„, each of length at most e. 

3. Is total boundedness preserved by homeomorphisms? Explain. [Hint: R is home- 
omorphic to (0, 1 ).] 

4. Show that A is totally bounded if and only if A can be covered by finitely many 
closed sets of diameter at most e for every e > 0. 

> 5. Prove that A is totally bounded if and only if A is totally bounded. 


We next give a sequential criterion for total boundedness. The key observation is 
isolated in: 

Lemma 73. Let (*„) be a sequence in (A/, d ), and let A = [x„ : n > 1} be its 

range. 

(i) If (x n ) is Cauchy, then A is totally bounded. 

(ii) If A is totally bounded, then (x„) has a Cauchy subsequence. 

proof, (i) Let e > 0. Then, since ( x „ ) is Cauchy, there is some index N > 1 such 
that diam[jc„ : n > N] < e. Thus: 


A = |jf|} U — U {jrAr-i} U [jt„ : n > N] . 

^ 1 v ■ i. - ✓ 

N sets of diameter < e 
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(ii) If A is a finite set, we are done. (Why?) So, suppose that A is an infinite 
totally bounded set. Then A can be covered by finitely many sets of diameter <1. 
One of these sets, at least, must contain infinitely many points of A. Call this 
set A). But then A\ is also totally bounded, and so it can be covered by finitely 
many sets of diameter <1/2. One of these, call it A 2 , contains infinitely many 
points of A\ . Continuing this process, we find a decreasing sequence of sets A D 
A\ D A 2 D ■ • • , where each A* contains infinitely many x„ and where diam(A*) < 
\/k. In particular, we may choose a subsequence (*„,) with x„ k e A k for all k. 
(How?) That (x ni ) is Cauchy is now clear since diam(x nj : j > k) < diam(A*) < 
l/k. □ 

Examples 7.4 

(a) The sequence x„ = ( - 1)" in R shows that a Cauchy subsequence is the best that 
we can hope for in Lemma 7.3 (ii). 

(b) Note that the sequence (e <n) ) in t\ has no Cauchy subsequence. 

We are finally ready for our sequential characterization of total boundedness: 

Theorem 7.5. A set A is totally bounded if and only if every sequence in A has 
a Cauchy subsequence. 

proof. The forward implication is clear from Lemma 7.3. To prove the backward 
implication, suppose that A is not totally bounded. Then, there is some e > 0 such 
that A cannot be covered by finitely many e-balls. Thus, by induction, we can 
find a sequence (x„) in A such that d(x„, x m ) > e whenever m =£ n. (How?) But 
then, (x n ) has no Cauchy subsequence. □ 

All of this should remind you of the Bolzano- Weierstrass theorem - and for good 
reason: 

Corollary 7.6. (The Bolzano- Weierstrass Theorem) Every bounded infinite sub- 
set of R has a limit point in R. 

proof. Let A be a bounded infinite subset of R. Then, in particular, there is 
a sequence (jc„) of distinct points in A. Since A is totally bounded, there is a 
Cauchy subsequence ( x „ k ) of (*„). But Cauchy sequences in R converge, and so 
(x„ k ) converges to some x e R. Thus, x is a limit point of A. □ 


EXERCISES 

Unless otherwise specified, each of the following exercises is set in a generic metric 
space (A/, d ). 

6. Prove that A is totally bounded if and only if every sequence (a:„) in A has a 
subsequence (x„ k ) for which d(x„ k , x„ kkl ) < 2~ k . 

7. Show that Corollary 7.6 follows from the nested interval theorem. 

8. If A is not totally bounded, show that A has an infinite subset B that is homeo- 
morphic to a discrete space (where B is supplied with its relative metric). [Hint: Find 
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e > 0 and a sequence (.v„ ) in A such that d( x n , x m ) > e for n m. How does this 
help?] 

> 9. Give an example of a closed bounded subset of ( x that is not totally 
bounded. 

> 10. Prove that a totally bounded metric space M is separable. [Hint: For each n, 
let D„ be a finite (l/n)-net for M. Show that D = D„ is a countable dense 
set.] 

11. Prove that H°° is totally bounded (see Exercises 3. 10 and 4.48). 


Complete Metric Spaces 

As you can now well imagine, we want to isolate the class of metric spaces in which 
Cauchy sequences always converge. It follows from Theorem 7.5 that we would have 
an analogue of the Bolzano- Weierstrass theorem in such spaces (see Theorem 7.1 1). 
In fact, we will find that this class of metric spaces has much in common with the real 
line R. 

A metric space M is said to be complete if every Cauchy sequence in M converges - 
to a point in M ! 

Examples 7.7 

(a) R is complete. This is a consequence of the least upper bound axiom; in fact, 
as we will see, the completeness of R is actually equivalent to the least upper 
bound axiom. 

(b) R" is complete (because R is). 

(c) Any discrete space is complete (trivially). 

(d) (0, 1 ) is not complete. (Why?) Hence, completeness is not preserved by homeo- 
morphisms. Which subsets of R are complete? 

(e) co, Ci, £ 2 , and £oo are all complete. The proofs are all very similar; we sketch the 
proof for f 2 below and leave the rest as exercises. 

(f) C[ a, b ] is complete. The proof is not terribly difficult, but it will best serve 
our purposes to postpone it until Chapter Ten, where several similar proofs are 
collected. 

The proof that £2 is complete is based on a few simple principles that will generalize 
to all sorts of different settings. This generality will become all the more apparent if 
we introduce a slight change in our notation. Since a sequence is just another name for 
a function on N, let’s agree to write an element / € £2 as / = (/ (k))ft x , in which case 
H/H 2 = (£*1| |/(£)| 2 ) 1/2 . For example, the notorious vectors e (n) will now be written 
e„, where e„(k) = (This is Kronecker’s delta, defined by = 1 if n — k and 
8„ .* =0 otherwise.) 

Let (/„) be a sequence in £ 2 , where now we write /„ = (fn(k))f =l , and suppose that 
(/„) is Cauchy in £ 2 . That is, suppose that for each e > 0 there exists an n 0 such that 
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ll/n - fm lb < e whenever m,n > n 0 . Of course, we want to show that (/„) converges, 
in the metric of I 2 , to some / € £ 2 . We will break the proof into three steps: 

Step 1. f(k) = limn-^oc f„(k) exists in R for each k. 

To see why, note that \f„(k)~ f m (k)\ < ||/„ - f m || 2 for any k, and hence 
(fn(k))^ =i is Cauchy in R for each k. Thus, / is the obvious candidate for the 
limit of (/„), but we still have to show that the convergence takes place in the 
metric space £ 2 ; that is, we need to show that / € £2 and that ||/„ - /|| 2 -» 0 (as 
n — > 00 ). 


Step 2. / e £ 2 ; that is, H/H 2 < 00 . 

We know that (/„ ) is bounded in £2 (why?); say, U/JI 2 < B for all n. Thus, for 
any fixed N < 00 , we have: 



Since this holds for any N , we get that H/H 2 < B. 

Step 3. Now we repeat Step 2 (more or less) to show that f„ -*■ f in £ 2 . 

Given e > 0, choose n 0 so that || /„ - f m lb < e whenever m,n > n 0 . Then, for 
any N and any n > n 0 , 

N N 

El/^-/^)l 2 = lim T,\Uk)-Uk)\ 2 <s 2 . 

*=1 k= I 

Since this holds for any N, we have || / - f„ || 2 < e for all n > n 0 . That is, /„->•/ 
in £ 2 . 


Examples 7.8 

(a) Just having a candidate for a limit is not enough. Consider the sequence (/„) in 
loo defined by /„ = ( 1 , . . . , 1 , 0, . . .), where the first n entries are 1 and the rest 

areO. The “obvious” limit is / = (1, 1 ) (all 1), but ||/ - /„ ||oo = 1 for all n. 

What’s wrong? 

(b) Worse still, sometimes the “obvious” limit is not even in the space. Consider 
the same sequence as in (a) and note that each /„ is actually an element of co. 
This time, the natural candidate / is not in co. Again, what’s wrong? 

As you can see, there can be a lot of details to check in a proof of completeness, and 
it would be handy to have at least a few easy cases available. For example, when is a 
subset of a complete space complete? The answer is given as: 

Theorem 7.9. Let (M , d) be a complete metric space and let A be a subset of 
M. Then, ( A,d ) is complete if and only if A is closed in M. 

proof. First suppose that (A, d ) is complete, and let (.v„) be a sequence in A that 
converges to some point x e M. Then (x„) is Cauchy in (A, d ) and so converges 
to some point of A. That is, we must have x € A and, hence, A is closed. 
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Next suppose that (x w ) is a Cauchy sequence in (A,d ). Then (*„) is also Cauchy 
in (Af , d ). (Why?) Hence, (jc„) converges to some point x e Af . But A is closed 
and so, in fact, x e A. Thus, (A , d ) is complete. □ 

Examples 7.10 

(a) [0,1], [0, oo), N, and A are all complete. 

(b) It follows from Theorem 7.5 that if a metric space (Af , d ) is both complete and 
totally bounded, then every sequence in Af has a convergent subsequence. In 
particular, any closed, bounded subset of R is both complete and totally bounded. 
Thus, for example, every sequence in [a, b] has a convergent subsequence. As 
you can easily imagine, the interval [ a y b ] is a great place to do analysis! We 
will pursue the consequences of this felicitous combination of properties in the 
next chapter. 


EXERCISES 


Unless otherwise stated , ( M y d ) denotes an arbitrary metric space. 

t> 12. Let A be a subset of an arbitrary metric space (Af, d ). If (A, d ) is complete, 
show that A is closed in Af . 


13. Show that R endowed with the metric p(x, y) = | arc tan a: — arctan >’| is not 
complete. How about if we try r(jc, y) = | Jt 3 — y 3 | ? 

14. If we define 


d{m , n) 


I _ I 

m n 


for m, n € N, show that d is equivalent to the usual metric on N but that (N, d ) is 
not complete. 

15. Prove or disprove: If Af is complete and / : (Af , d ) — > (N, p) is continuous, 
then / (Af ) is complete. 

> 16. Prove that R" is complete under any of the norms || • || i , || • |h» or || • ||oo. [This 
is interesting because completeness is not usually preserved by the mere equivalence 
of metrics. Here we use the fact that all of the metrics involved are generated by 
norms. Specifically, we need the norms in question to be equivalent as functions: 
II • || oo < II * lb 5 II • Hi < nil * II 00 * As we will see later, any two norms on R rt are 
comparable in this way.] 

17. Given metric spaces Af and N, show that Af x N is complete if and only if 
both Af and N are complete. 

i> 18. Fill in the details of the proofs that l j and t are complete. 

19. Prove that Co is complete by showing that Co is closed in €»• [Hint: If (/„) is a 
sequence in c 0 converging to / € €<», note that |/(*)| < \f(k) - / n (*)| + |/„(*)|. 
Now choose n so that the | f(k) — f n (k)\ is small independent of k.\ 

20. If (jc„) and (y„) are Cauchy in (Af , d ), show that (d{; c„, ^n))^, is Cauchy in R. 
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21. If (M, d ) is complete, prove that two Cauchy sequences (jc„) and (y„) have the 
same limit if (and only if) d(x n , y n ) -+ 0. 

22. Let D be a dense subset of a metric space M, and suppose that every Cauchy 
sequence from D converges to some point of M. Prove that M is complete. 

23. Prove that M is complete if and only if every sequence (*„) in M satisfying 
d(x n , at„ + i) < 2 _ ", for all n, converges to a point of M. 

24. Prove that the Hilbert cube H°° (Exercise 3. 10) is complete. 

25. True or false? If / : R -*■ R is continuous and if (x„) is Cauchy, then (f(x„)) 
is Cauchy. Examples? How about if we insist that / be strictly increasing? Show that 
the answer is “true” if / is Lipschitz. 


Our next result underlines the fact that complete spaces have a lot in common 
with R. 

Theorem 7.1 1. For any metric space (M, d), the following statements are equiva- 
lent: 

(i) (M, d ) is complete. 

(ii) (The Nested Set Theorem) Let F\ D F 2 D Fj D ■■■ be a decreas- 
ing sequence of nonempty closed sets in M with diam(F„) — ► 0. Then, 
nr=, F„ ^ 0 (in fact, it contains exactly one point). 

(iii) (The Bolzano-Weierstrass Theorem) Every infinite, totally bounded subset 
of M has a limit point in M. 

proof, (i) => (ii): (Compare this with the proof of the nested interval theorem, 
Theorem 1.5.) Given (F„) as in (ii), choose x„ € F„ for each n. Then, since the 
F„ decrease, {x k : k > n) c F„ for each n, and hence diamfjt* : k > n) ^ 0 
as n -*■ oo. That is, (x„) is Cauchy. Since M is complete, we have x„ -*■ x for 
some x e Af. But the F n are closed, and so we must have x e F„ for all n. Thus, 

nr=.^^0. 

(ii) (iii): Let A be an infinite, totally bounded subset of M. Recall that we 

have shown that A contains a Cauchy sequence Cx„) comprised of distinct points 
(x„ # x m for n it m). Now, setting A n = [x k : k > n}, we get A D A\ D Ai D • • • , 
each A n is nonempty (even infinite), and diam(A„) -+ 0. That is, (ii) almost applies. 
But, clearly, A n D A n+ i ^ <t> for each n, and diam(A„) = diam(A„) -»• 0 as 
n -*■ oo. Thus there exists an x € Now x„ e A„ implies that 

d( x„,x) < diam ( A„) -*■ 0. That is, x n -*■ x and so x is a limit point of A (see 
Exercise 4.33). 

(iii) => (i): Let (x„) be Cauchy in (M, d ). We just need to show that (*„) has a 
convergent subsequence. Now, by Lemma 7.3, the set A = {*„ : n > 1 } is totally 
bounded. If A happens to be finite, we are done. (Why?) Otherwise, (iii) tells us 
that A has a limit point x € M. It follows that some subsequence of (x„) converges 
to x. (Why?) □ 



96 


Completeness 


In particular, note that Theorem 7. 1 1 holds for M = R. In this case, each of the three 
statements in Theorem 7.11 is equivalent to the least upper bound axiom. That is, we 
might have instead assumed one of these three as an axiom for R and then deduced the 
existence of least upper bounds as a corollary. What’s more, the fact that monotone, 
bounded sequences converge in R is also equivalent to the least upper bound axiom. 
(See the discussion following Theorem 1 .5.) In R, then, completeness takes on multiple 
personalities, with each new persona directly related to the order properties of the real 
numbers. 


EXERCISES 

Each of the following exercises is set in a metric space M with metric d. 

> 26. Just as with the nested interval theorem, it is essential that the sets F n used in the 
nested set theorem be both closed and bounded. Why? Is the condition diam( F n ) — ► 0 
really necessary? Explain. 

> 27. Note that the version of the Bolzano- Weierstrass theorem given in Theo- 
rem 7. 1 1 replaces boundedness with total boundedness. Is this really necessary? 
Explain. 

28. Suppose that every countable , closed subset of M is complete. Prove that M is 
complete. 

29. Prove that M is complete if and only if, for each r > 0, the closed ball { y e 
M : d(: c, v) < r} is complete. 

30. If (Af , d) is complete, prove that every open subset G of M is homeomorphic 
to a complete metric space. [Hint: Let F = M \ G and consider the metric p(x , y) = 
d(x % y) + | (d(x % F))" 1 - (d(y\ F)) -1 | on G .] 


In any normed vector space, the extra algebraic structure makes completeness some- 
what easier to test. That this is so can be seen through a clever observation due to Stefan 
Banach. In fact, Banach made so many clever observations about completeness that we 
now refer to a complete normed vector space as a Banach space. 

Here’s the setup: Given a sequence (x„) in a normed vector space X, the series 
Y1^L\ x n is said to converge in X if the sequence of partial sums x n converges to 
some vector x e X, that is, if \\x - I! 0 as N oo. In this case we write, as 

usual, x = Y1T=i x n and we say that x n is sutnmable to x. In other words, Y1T=\ x n 

is the name that we give to the limit of the partial sums. 

Now, just as in R, sequences and series are interchangeable: Each series is really 
a sequence of partial sums and, conversely, each sequence is the sequence of partial 
sums for some series. In particular, notice that x„ = x\ + £" =2 (jc, - The se- 
quence (*„) and the series — */-i) live or die together; both converge or both 

diverge. With this tool at our disposal (and Banach’s help, of course), it is not hard to 
see that the question of completeness for a normed space can be settled by a simple 
test: 
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Theorem 7.12. A normed vector space X is complete if and only if every abso- 
lutely summable series in X is summable. That is, X is complete if and only if 
X„ converges in X whenever , ||.vj < oo. 

proof. First suppose that X is complete, and let (jc„) be a sequence in X for 
which | ||jc„ || < oo. If we write s m = £" =1 x„ for the sequence of partial 
sums, then, for m > n, the triangle inequality yields 

II m || m 

ll-V/n “ S n || = Xk - 

|| k=n + \ 

Since the partial sums of |(a:„ || form a convergent (and hence Cauchy) se- 
quence, we have that £“ =n+ , ||.v* || ->• 0 as m, n oo. Thus, (s„) is also a Cauchy 
sequence and, as such, converges in X. 

Next suppose that absolutely summable series in X are summable, and let (.v„ ) 
be a Cauchy sequence in X. As always, it is enough to find a subsequence of 
( x„ ) that converges. To this end, choose a subsequence (*„,) for which ||jt nt+I - 
x nk || < 2~ k for all k. (How?) Then, in particular. - x„ k || converges. 

Consequently, the series Yl?=\ (*n^, -x„ k ) converges in X. As we remarked earlier, 
this means that the sequence x„ m+1 = x n< + (jc niul - x „ k ) converges in X. □ 

There is never too much of a good thing: Note that Theorem 7.12 gives us yet another 
characterization of completeness in R. The familiar fact that every absolutely summable 
series of real numbers is summable is actually equivalent to the least upper bound axiom. 


EXERCISES 

31. If 52^1, jc„ is a convergent series in a normed vector space X , show that 

32. Use Theorem 7.12 to prove that t\ is complete. 

33. Let s denote the vector space of all finitely nonzero real sequences; that is, 
x = (jc„) G s if x n = 0 for all but finitely many n. Show that s is not complete under 
the sup norm Moc = sup„ |x n |. 

34. Prove that a normed vector space X is complete if and only if every sequence 
(jt„) in X satisfying \\x n — jc w+ i || < 2~ n y for all n y converges to a point of X. 

35. Prove that a normed vector space X is complete if and only if its closed unit 
ball B = {a- € X : ||jt|| < 1} is complete. 


Fixed Points 

Completeness is a useful property to have around if you are interested in solving equa- 
tions. How so? Well, think about the sorts of tricks that we use in R. How, for example. 
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would you compute \fl “by hand”? You would most likely start by finding an approxi- 
mate solution to the equation x 2 = 2 and then look for ways to improve your estimate. 

Most numerical techniques give, in fact, a sequence of “better and better” approxi- 
mate solutions, where “better and better” typically means that the error in approximation 
gets smaller. The completeness of R affords us such luxuries; we can effectively pro- 
claim the existence of solutions without necessarily finding them! Once we have a 
Cauchy sequence of approximate solutions, completeness will finish the job. 

The same holds true in any complete space. We can effectively solve certain “ab- 
stract” equations by simply displaying a Cauchy sequence of approximate solutions. 
One such technique, called the method of successive approximations, is used in the 
standard proof of existence for solutions to differential equations and is generally cred- 
ited to Picard in 1890. (But the technique itself goes back at least to Liouville, who 
first published it in 1838, and it may have even been known to Cauchy.) We will see an 
example of this method shortly. 

The modem metric space version of the method of successive approximations was 
explicitly stated in Banach’s thesis in 1922. In this setting it is most often referred to as 
Banach 's contraction mapping principle. A map / : M -*■ M on a metric space (M, d ) 
is called a contraction (or, better still, a strict contraction) if there is some constant a 
withO<<* < 1 such that d(f(x), f(y)) < a d(x, y) is satisfied for all x,y € M. That is, 
a contraction shrinks the distance between pairs of points by a factor strictly less than 
1 . Please note that any contraction is automatically continuous (since it is Lipschitz). 

Banach’s approach seeks to solve an “abstract” equation of the form /(x) = x (this 
is more general than it might appear). That is, we look for a fixed point for /. If / is 
a contraction defined on a complete metric space, we can even prescribe a sequence of 
approximate solutions: 

Theorem 7.13. Let (M, d)bea complete metric space, and let f : M -*■ M be a 
(strict) contraction. Then, f has a unique fixed point. Moreover, given any point 
xo e M, the sequence of functional iterates (f (xo)) always converges to the fixed 
point for f. 

[The notation /" means the composition of / with itself n times: / o / o • • • o /. For 
example, f 2 (x) = f(f(x)), f i (x) = f(f 2 (x)), and so on. The sequence of functional 
iterates ( f n (x )) is called the orbit of x under /.] 

proof. Let xo be any point in M, and consider the sequence (f n (x o)). 

If (/"(x 0 )) converges, we are done. Indeed, if x = lim n _oo / n (x 0 ), then, since 
/ is continuous, we have /(x) = limn-.^ /(/"(xo)) = lim„_ 00 / n+1 (xo) = 
lim„_oo f(x o) = x. And this x is unique, for if y is also a fixed point for /, then 
d(x, y) = d(f(x), f(y)) < a d(x, y), which forces d(x, y) = 0. 

So our goal is clear: We need to show that (/ "(xo)) is a Cauchy sequence. But: 

d(/ n+ W /"(*>))<«<*(/"(* o), /""'(* 0 )) 

< a 2 d (/"“'(xq), / n_2 (x 0 )) 


< a n d(f(x 0 ), x 0 ) = Cat". 
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And so for m > n the triangle inequality yields 

m m 

d{f m+ \x o), /"(*<>)) < £>(/* +1 (* o)> /*M < C or*. (7.1) 

k=n k=n 

But since 0 < a < 1 , we have a * ->• Oas/n,n -► oo. (Why?) Thus, (/"(jc 0 )) 
is Cauchy. □ 

Note that the proof of Theorem 7.13 even gives us a rough estimate for the error in 
approximation. If we pick an initial “guess” jc 0 for the fixed point x, then, by letting 
m -*■ oo in equation (7. 1 ), we get 

00 a" 

d(f(xo)>x)<d(f(xo),x 0 ) Y]a k = d(f(x 0 ),x 0 ) . 

Example 7.14 

Suppose that / : [ a, b ] -+ [ a, b ] is continuous on [ a, b ], differentiable on (a, b), 
and has |/'(jc)| < a < 1 for all a < x < b. Then it follows from the mean value 
theorem that \f ( x ) — /(y)| < a\x — y| for all x, y 6 [ a, b J and, hence, that / has 
a unique fixed point. See Figures 7.1 and 7.2. 


The case 0< /'<!. 


Figure 

7.1 



The case -!</'< 0. 
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EXERCISES 

36. The function /(jc) = x 2 has two obvious fixed points: po = 0 and p i = 1. 
Show that there is a 0 <8 < 1 such that |/(jc) — po\ < \x — p 0 | whenever 
\x — Po\ < 8, x ^ p 0 . Conclude that /"(jc) — > p 0 whenever |jc — p 0 1 < 5,jc ^ po. 
This means that po is an attracting fixed point for /; every orbit that starts out near 
0 converges to 0. In contrast, find a 8 > 0 such that if |jc — p\ | < 8 y x ^ pi, then 
\f(x) — pi | > |jc — pi |. This means that pi is a repelling fixed point for / ; orbits 
that start out near 1 are pushed away from 1 . In fact, given any x ^ 1 , we have 

/" COA1. 

37. Suppose that / : (a, b) — ► (a, b) has a fixed point p in (a, b) and that / is 
differentiable at p. If 1/ '(p)| < 1, prove that p is an attracting fixed point for /. If 
1/ '(p)| > 1, prove that p is a repelling fixed point for /. 

38. 

(a) Let f(x) = arctan x. Show that / '(0) = 1 and that 0 is an attracting fixed point 
for /. 

(b) Let g(x) = x 3 + x. Show that g '(0) = 1 and that 0 is a repelling fixed point 
for g. 

(c) Let fi(x) = x 2 + 1 /4. Show that h\ 1/2)= 1 and that 1 /2 is a fixed point for 
h that is neither attracting nor repelling. 

39. The cubic jc 3 — x — 1 has a unique real root jc 0 with 1 < jc 0 < 2. Find it! [Hint: 
Iterating the function / (jc) = jc 3 — 1 won’t work! Why?] 


Example 7.15 

We’ll show how Theorem 7. 1 3 can be used to find an estimate for, say, v^5. That 
is, we’ll solve the equation F(x) = jc 3 - 5 = 0. Now it is clear that 1 < v^5 < 2, so 
let’s consider F as a map on the interval [1,2]. And since the equation F(x) = 0 
isn’t quite appropriate, let’s consider the equivalent equation f(x) = jc, where 
/(jc) = jc - kF(x) for some suitably chosen k eR. The claim is that it’s possible 
to find k > 0 such that (i) / : [1,2] — ► [1,2], and (ii) |/'(*)| < a < 1 for 
1 < jc < 2. In fact, a bit of experimentation will convince you that any 0 < k < 1/6 
will do. Let’s try A. = 1/8. Table 7.1 displays a few iterations of the scheme 
jc n+1 = /(jc„) = x„ - (jc 3 - 5)/8, starting with jcq = 1.5. The last value is accurate 


Table 7.1 


Xn 


1.5 

1.703125 
1.7106070518 
1.7099147854 
1.7099818467 
1.7099753773 
1.7099760016 


/(*„) 


1.703125 

1.7106070518 

1.7099147854 

1.7099818467 

1.7099753773 

1.7099760016 

1.7099759414 
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to at least six places. Roughly speaking, each iteration increases the accuracy by 
one decimal place. Not bad. 


EXERCISE 

40 . Extend the result in Example 7.15 as follows: Suppose that F : [a, b ] -*■ R 
is continuous on [ a, b ], differentiable in (a, b), and satisfies F(a) < 0, F(b) > 0, 
and 0 < K\ < F \x) < AS. Show that there is a unique solution to the equation 
Fix) = 0. [Hint: Consider the equation / (x) = x, where fix) — x — A.F( x) for 
some suitably chosen A..] 


Under suitable conditions on /, the same technique can be applied to the problem 
of existence and uniqueness of the solution to the initial value problem: 

y' = fix.y), y(0) = y 0 - 


For example, if / is continuous in some rectangle containing (0, y 0 ) in its interior, and 
if / is Lipschitz in its second variable, \fix, y) - fix , z)| < F|v - z|, for some constant 
K, then a unique solution exists - at least in some small neighborhood of x = 0. This 
fact was first observed by Lipschitz himself (hence the name Lipschitz condition), but 
Lipschitz did not have metric spaces at his disposal. Most modem proofs use some 
form of Banach’s contraction mapping principle (often in the form of the method of 
successive approximations). 

We will not give the full details of the proof here, but we will at least show how 
Banach’s theorem enters the picture. For this we will want to rephrase the problem as 
a fixed-point problem on some complete metric space. First notice that by integrating 
both sides of the differential equation we get 

y(x) = y 0 +f fit, y(t))dt ix > 0). 

Jo 

That is, we need a fixed point for the map h-> Fiip), where 

iFi<p))ix) = y 0 + [ fit,(pit))dt. 

Jo 


For simplicity, let’s assume that / is defined and continuous on all of R 2 (and still 
Lipschitz in its second variable). Then the integral on the right-hand side of this formula 
is well defined for any continuous function <p. Let’s consider F as a map on C[ 0, S ], 
where 8 > 0 will be specified shortly. Next we’ll check that F is a Lipschitz map on 
C[ 0 , 8 ]. For any 0 < x < 8 , note that 


|(F(p))(jr)-(F(*))(*)| 


= \l 
*-j; 


fit,<pit))dt - r fn, * u)) dt 

Jo 

\AtMt))-f(t,*m dt 


Wit) - tit ) | dt 


< Kx ■ max |^>(f) - t(0\ 

0 <t<x 

< K8 \W-t\\oc. 
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It follows that || F(<p) - FWIloo < K8 ||^ - t^lloo- Thus, F is a contraction on C[0, 5 ] 
provided that 6 is chosen to satisfy K8 < 1 and, in this case, F has a unique fixed point 
in C[0 , 8 ]. 


Example 7.16 

Consider the initial value problem y ' = 2x(l + y), y(0) = 0. By integrating both 
sides of the differential equation, we see that we need a function <p satisfying 
<p(x) = £ 2/(1 +<p(t))dt = (F(<p))(x). The method of successive approximations 
amounts to taking an initial “guess” at the solution, say <po = 0, and iterating F. 
Thus, <p\(x) = /q 2/(1 + 0)dt = x (i) 2 . Next, <p 2 (x) = / 0 * 2/(1 + t 2 )dt = x 2 + x*/2. 
Another iteration would yield <pi(x) = x 2 + x A /2 4- jc 6 /6. And so on. Finally, 
induction yields 


00 x 2 * 


<p(x) = J2jr = e ' 


*= i 


i. 


This solution is valid on all of R (and agrees, naturally, with the solution obtained 
by separation of variables). 


EXERCISES 

41. Let M be complete and let / : M -*■ M be continuous. If /* is a strict con- 
traction for some integer k > 1, show that / has a unique fixed point. 

42. Define T : C[0, 1 ] -► C[0, 1 ] by (T(f))(x) = f* f(t)dt. Show that T is 
not a strict contraction while T 2 is. What is the fixed point of 7? 

43. Show that each of the hypotheses of the contraction mapping principle is nec- 
essary by finding examples of a space M and a map / : M -*■ M having no fixed 
point where: 

(a) M is incomplete (but / is still a strict contraction). 

(b) / satisfies only d(f (x), / (y)) < d(x, y) for all x ^ y (but M is still complete). 


Completions 

Completeness is a central theme in this book; it will return frequently. It may comfort 
you to know that every metric space can be “completed.” In effect, this means that by 
tacking on a few “missing” limit points we can make an incomplete space complete. 
While the approach that we will take may not suggest anything so simple as adding a 
few points here and there, it is nevertheless the picture to bear in mind. In time, all will 
be made clear! 

First, a definition. A metric space (M , d) is called a completion for ( M , d) if 

(i) (M,d) is complete, and 

(ii) (A#, d ) is isometric to a dense subset of (M, d). 
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If M is already complete, then certainly M = M works. Except for this easy case, 
there is no obvious reason why completions should exist at all. 

Formally, condition (ii) means that there is some map i : M -*■ M such that d(x, y) — 
d(i(x), i(yj) for all x, y € M, and such that i(M) is a dense subset of M. Informally, 
condition (ii) says that we may regard M as an actual subset of M (in which case / is 
just the inclusion map from M into M ), that d \mxm= d (i.e., the relative metric that 
M inherits as a subset of M is just d ), and that M is dense in M. 

The requirement that M is dense in M is added to insure uniqueness (more on this 
in a moment), but it is actually easy to come by. The real work comes in finding any 
complete space ( N , p) that will accept M, isometrically, as a subset, for then we simply 
take M = cl N M. Notice that M is a closed subset of a complete space and hence is 
complete, and that M is clearly dense in M. 

Given a metric space M, we need to construct a complete space that is “big enough” 
to contain M isometrically. One way to accomplish this is to consider the collection 
of all bounded, real- valued functions on M . (This is roughly analogous to using the 
power set of M when looking for a set that is bigger than M.) Here’s how we’ll do it: 
Given any set M, we will define too(M) to be the collection of all bounded, real-valued 
functions / : M -*• R, and we will define a norm on too(M) in the obvious way: 

ll/lloc = sup|/(jt)|. 

xeM 

This notation is consistent with that used for l since, after all, a bounded sequence of 
real numbers is nothing other than a bounded function on N. That is, i = foo(N). 

The fact that || • ||oo is a norm on too(M) uses the same proof that we used for £<». 
And the fact that loo(M) is complete under this norm again uses the same proof that 
we used for l^. (See Exercises 18 and 44 and Exercise 3.21.) All of the fighting takes 
place in R and has little to do with the sets M or N. It might help if you think of the 
“A/” in foo(A/) as simply an index set. Any index set with the same cardinality as M 
would suit our purposes just as well. 

To find a completion for M , then, it suffices to show that ( M,d ) embeds isometrically 
into loo(M). Thus, each point x e M will have to correspond to some real-valued 
function on M. An obvious choice might be to associate each x with the function 
t d(x, t ). Now this function is not necessarily bounded, but it is essentially the right 
choice. We just have a few details to tidy up. 

Lemma 7.17. Let (M, d) be any metric space. Then, M is isometric to a subset 

Of ioo(M). 

proof. Fix any point a e M. To each x <= M we associate an element f x 6 too(M) 

by setting 


fx(0 = d(x,t) - d(a.t), teM. 

Note that f x is bounded since !/*(/)! = I d(x,t) - d(a,t) \ < d(x,a), a number 
that does not depend on t. That is, || f x H*, < d(x, a). All that remains is to check 
that the correspondence x f x is actually an isometry. But H/* — f y \\oo = 
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su P/eM \d{x, t) — d(y , 01 5 d(x, y ), from the triangle inequality, and |</(x, t) - 
d(y, r)| = d(x , y) when t = x or t = y. Thus, ||/ v — /Vlloc = d(x , y). □ 

Lemma 7. 17 shows that M is identical to the subset {/* : x e M] of t x (M). We may 
define a completion of M by taking M to be the closure of {/, : x € M) in t x (M). Seem 
a bit complicated? Would it surprise you to learn that this completion is essentially the 
only one available? Well, prepare yourself! 

Theorem 7.18. If My and Mi are completions of M, then M\ and Mi are isomet- 
ric. 

proof. For simplicity of notation, let’s suppose that M is actually a subset of Mi 
and Mi (and dense in each, of course). This will make for fewer arrows to chase 
in the diagram below. The claim is that the identity on M “lifts” to an isometry / 
from M| onto Mi. 

M { M- Ml 

u u 

M -U M 

Here’s how. We will define f : M\ -*■ Mi through a series of observations. 
First, given x e My, there is some sequence (x„) in M such that x„ -*■ x in My, 
because M is dense in M|. In particular, (x„) is Cauchy in My. But then (x„) is 
also Cauchy in Mi. (Why? Recall that (x„) c Me Mi.) Hence x„ — ► v in Mi, 
for some y e M 2 , because M 2 is complete. Now set f(x) = y. In other words, put 
/(M|-limx„) = M 2 -lim /(x„). 

We first check that / is well defined. If (x„) and (j„) are sequences in M, and 
if both converge to x in Mi , then both must also converge to y in M 2 since 

dl(X n ,Z n ) = dy(X„,Z„) = d(x n , Z n ) * 0, 

where we’ve written dy for the metric in M i and d 2 for the metric in Mi (recall 
that both agree with d on pairs from M). 

Now that we know that / is well defined, we also know that f\ M = /; that 
is, / is an extension of the identity on M. This is more or less obvious, since, if 
x € M, we have the constant sequence, x„ = x for all n, at our disposal. 

Next let’s check that / is onto. Given y € Mi, there is some sequence (jc„) in 
M such that x„ -*■ y in Mi (because M is dense in Mi). But, just as before, this 
means that x„ -*■ x in Mi for some x. Clearly, y = /(x). 

Finally, we check that / is an isometry. Given x, y e Mi, choose sequences 
(* n ), 0’n) in M such that x„ ->• x in M| and y„ — ► y in M|. Then, x„ f(x) in 
Mi and y„ — ► /(y) in M 2 . Consequently, 

dy(x.y) = lim d(x„.y„) = d 2 (/(x), /(y)). (Why?) □ 

n-*oc 

The proof of Theorem 7.18 allows us to make precise the notion of “adding on” a few 
points to make M complete. The points that are “added on” are limit points for entire 
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collections of (nonconvergent) Cauchy sequences. Each point jc in the completion M 
corresponds to the collection of all Cauchy sequences in M that converge to jc; given 
one such Cauchy sequence (*„), any other Cauchy sequence (y n ) in the same collection 
must be “equivalent” to (*„) in the sense that d(x ny y n ) -► 0. In fact, this is the standard 
construction; we define an equivalence relation on the class C of all Cauchy sequences in 
M by declaring (jc„) and (y„) to be equivalent whenever d( jc„, y n ) -► 0. The completion 
of M , then, is the set of equivalence classes of C under this relation. 

In the next chapter we will use a technique that is similar to the one used in the 
proof of Theorem 7. 18 to construct extensions for maps other than isometries. The key 
ingredients will still be a dense domain of definition and the preservation of Cauchy 
sequences. 


EXERCISES 

Except where noted , M is an arbitrary metric space with metric d. 

44. Give any set Af , check that i ^(Af ) is a complete normed vector space. 

45. If Af and N are equivalent sets, show that loo(M) and too(N) are isometric. 

[Hint: If g : N — ► Af is any map, then / / o g defines a map from (M) to 

t^N). How does this help?] 

46. If A is a dense subset of a metric space (Af , d ), show that ( A,d ) and ( Af , d ) 
have the same completion (isometrically). [Hint: If M is the completion for Af, then 
A is dense is M. Why?] 

47. A function / : (A/, d ) — ► (N y p) is said to be uniformly continuous if / 
is continuous and if, given e > 0, there is always a single 8 > 0 such that 
P(/C*)» /(>’)) < £ for any jc, y € Af with d(: r, y) < 8. That is, 8 is allowed 
to depend on / and e but not on jc or y. Prove that any Lipschitz map is uniformly 
continuous. 

48. Prove that a uniformly continuous map sends Cauchy sequences into Cauchy 
sequences. 

49. Suppose that / : Q — ► R is Lipschitz. Prove that / extends uniquely to a 
continuous function g : R — ► R. [Hint: Given jc € R, define g(x) = lim,,.^ /(r„), 
where (r„) is a sequence of rationals converging to jc.] 

50. Given a point a € Af and a subset A C Af , show that each of the functions 
jc d( jc, a) and jc d(x y A) are uniformly continuous. 

51. Two metric spaces (M y d) and ( N , p) are said to be uniformly homeomorphic 
if there is a one-to-one and onto map / : M — > N such that both / and f~ l are 
uniformly continuous. In this case we say that / is a uniform homeomorphism. 
Prove that completeness is preserved by uniform homeomorphisms. 


Just as we have solved one problem, we have raised another. We now know that 
every metric space has a unique completion (at least if we agree to identify isometric 
spaces). But suppose that the incomplete metric space that we start with carries some 
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extra structure. Say that we need the completion of an incomplete normed vector space, 
for example. Will we have to give up the vector space structure to gain completeness? 
In other words, is the completion of a normed vector space still a normed vector space? 
In still other words, could the completion be more trouble than its worth? 

Luck is with us on this question; the completion of a normed vector space is indeed a 
Banach space. The proof is not terribly hard, but it is rather tedious, with lots of details 
to verify. The key steps, though, are easy to describe. 

Given a normed vector space X and its completion X, we need to suitably define 
both addition and scalar multiplication on X (and check that X is a vector space under 
these), and we have to define a suitable norm on X. So, suppose that we are handed x, 
y € X, and scalars a, p € R. How do we define ax + /3_y? Well, choose sequences (*„), 
(y„) in X such that x n x and y„ -* y in X, and define 

ax + 0y = lim ( ax„ + 0y n ). 

n-*oc 

(This makes sense because ( ax„ + fly n ) is Cauchy in X.) After checking that this 
definition turns X into a vector space, there is only one reasonable choice for a norm 
on X. We would set 


11*11 = d(x, 0) = lim d(x n , 0) = lim ||*J 

n— *oc n— ►oo 

and check that this is actually a norm on X. (If so, then it has to be complete - that is 
already determined by d.) In this setting, X is a dense linear subspace of X. 


Notes and Remarks 

Fr6chet introduced complete metric spaces in his thesis, Fr6chet [ 1 906], while Hausdorff 
coined the term totally bounded. But much of what is in this chapter has its roots in 
Cantor’s work: The nested set theorem for R, a special case of Theorem 7.1 l(ii), is 
generally credited to Cantor. The metric space version is due to Fr6chet. 

For more on the result in Exercise 30, see Kelley [1955]. Exercise 38 is taken 
from Gulick [1992]. Examples 7.14 and 7.15, along with Exercise 40, are based on 
the presentation in Kolmogorov and Fomin [1970]. Exercise 39 is adapted from an 
entertaining article by Cannon and Elich [1993]. For more applications of functional 
iteration and its relation to chaos and fractals, see Barnsley [1988], Devaney [1992], and 
Edgar [1990]. For a historical survey of functional iteration, see D. F. Bailey [1989]. 

Picard’s theorem appears in Picard [1890]. Banach’s observation on completeness 
for normed linear spaces (Theorem 7.12) and the contraction mapping principle (The- 
orem 7.13) are from his thesis, Banach [1922], You will find even more applications of 
Banach’s contraction mapping theorem in Copson [1968], including proofs of the in- 
verse and implicit function theorems. For an interesting application to “crinkly” curves, 
see Katsuura [1991]. 

For a brief survey of some of fixed point theory’s “greatest hits,” see Shaskin [ 1991 ]. 
Fixed point theory remains a hot research area; for a look at some of the recent devel- 
opments, see Goebel and Kirk [1990]. 
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It was Hausdorff who first showed that every metric space has a completion, and 
his proof is based on what he calls the Cantor-Mlray theorem (the description of the 
irrationals in terms of Cauchy sequences of rationals). The proof given here is a hybrid; 
Lemma 7.17 is based on a proof given in Kuratowski [1935] (but see also Fr6chet[ 1928] 
and Kaplansky [1977]) while Theorem 7.18 (and the subsequent remarks) follows the 
lines of HausdorfF’s original proof (see, for example, Hausdorff [1937]). Note that the 
function f x used in the proof of Lemma 7.17 is actually a continuous function on M - 
we will use this observation later to show that (under certain circumstances) M embeds 
isometrically into C(M), the space of continuous real-valued functions on M. 

We will have much more to say about uniform continuity (Exercise 47) and uniform 
homeomorphisms (Exercise 5 1 ) in the next chapter. 
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Compact Metric Spaces 

A metric space (A/, d ) is said to be compact if it is both complete and totally bounded. 
As you might imagine, a compact space is the best of all possible worlds. 

Examples 8.1 

(a) A subset K of R is compact if and only if K is closed and bounded. This 
fact is usually referred to as the Heine-Borel theorem. Hence, a closed bounded 
interval [ a, b ] is compact. Also, the Cantor set A is compact. The interval (0, 1 ), 
on the other hand, is not compact. 

(b) A subset K of R" is compact if and only if K is closed and bounded. (Why?) 

(c) It is important that we not confuse the first two examples with the general case. 
Recall that the set {<?„ : n > 1 ) is closed and bounded in £<» but not totally 
bounded - hence not compact. Taking this a step further, notice that the closed 
ball {x : ||jt||oc < 1} in l x is not compact, whereas any closed ball in R" is 
compact. 

(d) A subset of a discrete space is compact if and only if it is finite. (Why?) 

Just as with completeness and total boundedness, we will want to give several equiva- 
lent characterizations of compactness. In particular, since neither completeness nor total 
boundedness is preserved by homeomorphisms, our newest definition does not appear 
to be describing a topological property. Let’s remedy this immediately by giving a 
sequential characterization of compactness that will turn out to be invariant under 
homeomorphisms. 

Theorem 8.2. (M , d ) is compact if and only if every sequence in M has a sub- 
sequence that converges to a point in M. 

PROOF. 


totally bounded 
+ 

complete 


every sequence in M has 
a Cauchy subsequence 
+ 

Cauchy sequences converge 


□ 
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It is easy to believe that compactness is a valuable property for an analyst to have 
available. Convergent sequences are easy to come by in a compact space; no fussing 
with difficult prerequisites here! If you happen on a nonconvergent sequence, just 
extract a subsequence that does converge and use that one instead. You couldn’t ask for 
more! 

Given a compact space, it is easy to decide which of its subsets are compact: 

Corollary 8.3. Let A be a subset of a metric space M. If A is compact, then A 
is closed in M. If M is compact and A is closed, then A is compact. 

proof. Suppose that A is compact, and let (x„) be a sequence in A that converges 
to a point x e M. Then, from Theorem 8.2, (x„) has a subsequence that converges 
in A, and hence we must have x e A. Thus, A is closed. 

Next, suppose that M is compact and that A is closed in M. Given an arbitrary 
sequence (*„) in A, Theorem 8.2 supplies a subsequence of (.*„) that converges 
to a point x e M . But since A is closed, we must have .t e A. Thus, A is com- 
pact. □ 


EXERCISES 

Unless otherwise stated, ( M,d ) denotes a generic metric space. 

> 1. If AT is a nonempty compact subset of R, show that sup K and inf K are elements 
of K. 

> 2. Let E = € Q : 2 < x 2 < 3), considered as a subset of Q (with its usual 

metric). Show that E is closed and bounded but not compact. 

3. If A is compact in M, prove that diam(A) is finite. Moreover, if A is nonempty, 
show that there exist points x and y in A such that diam(/4) = d(x, y). 

4. If A and B are compact sets in M, show that A U B is compact. 

5. True or false? M is compact if and only if every closed ball in M is compact. 

6. If A is compact in M and B is compact in N , show that Ax B is compact in 
M x N (see Exercise 3.46). 

7. If K is a compact subset of R 2 , show that K C [a, b] x [ c, d ] for some pair 
of compact intervals [a,b] and [ c, d ]. 

8. Prove that the set {jc € R n : \\x\\\ = 1} is compact in R" under the Euclidean 
norm. 

9. Prove that (Af , d ) is compact if and only if every infinite subset of M has a limit 
point. 

10. Show that the Heine-Borel theorem (closed, bounded sets in R are compact) 
implies the Bolzano- Weierstrass theorem. Conclude that the Heine-Borel theorem is 
equivalent to the completeness of R. 
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11. Prove that compactness is not a relative property. That is, if K is compact 
in Af, show that K is compact in any metric space that contains it (isometri- 
cally). 

12. Show that the set A = {jc € £ 2 : \x„\ < 1/n, n = 1, 2, . . .} is compact in £ 2 . 
[Hint: First show that A is closed. Next, use the fact that l/n 2 < 00 to show 
that A is “within e” of the set A fl {jc € £2 : |jc„ | = 0, n > N}.] 

13. Given c n > 0 for all n, prove that the set [x e C 2 : \x n \ < c„, n > 1} is 
compact in £2 if and only if c\ < 00 . 

14. Show that the Hilbert cube H 00 (Exercise 3.10) is compact. [Hint: First 
show that H°° is complete (Exercise 7.24). Now, given e > 0, choose N so that 

2”" < e and argue that H°° is “within e” of the set [jc e H°° : |jt„| = 0 for 
n > N).] 

15. If A is a totally bounded subset of a complete metric space Af , show that A is 
compact in Af . For this reason, totally bounded sets are sometimes called precompact 
or conditionally compact. In fact, any set with compact closure might be labeled 
precompact. 

16. Show that a metric space Af is totally bounded if and only if its completion Af 
is compact. 

> 17. If Af is compact, show that Af is also separable. 

18. A collection (t/ tt ) of open sets is called an open base for Af if every open set 
in Af can be written as a union of the U a . For example, the collection of all open 
intervals in R with rational endpoints is an open base for R (and this is even a 
countable collection). (Why?) Prove that Af has a countable open base if and only if 
Af is separable. [Hint: If {jc,,} is a countable dense set in M, consider the collection 
of open balls with rational radii centered at the x„.] 

19. Prove that M is separable if and only if M is homeomorphic to a to- 
tally bounded metric space (specifically, a subset of the Hilbert cube). [Hint: See 
Exercise 4.49.] 


To show that compactness is indeed a topological property, let’s show that the con- 
tinuous image of a compact set is again compact: 

Theorem 8.4. Let f : (A/, d ) -> ( N , p) be continuous. If K is compact in M, 
then f(K ) is compact in N. 

proof. Let (y„) be a sequence in f(K). Then, y„ = f(x„) for some sequence 
(*„) in K. But, since K is compact, (x„) has a convergent subsequence, say, 
jc„, -*■ x e K. Then, since / is continuous, y„ k = f(x„ t ) -*■ f(x) e f(K). Thus, 
f(K) is compact. □ 
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Theorem 8.4 gives us a wealth of useful information. In particular, it tells us that 
real-valued continuous functions on compact spaces are quite well behaved: 

Corollary 8.S. Let ( M , d) be compact. If f : M -*■ R is continuous, then f is 
bounded. Moreover, f attains its maximum and minimum values. 

proof. f(M) is compact in R; hence it is closed and bounded. Moreover, 
sup f{M) and inf f(M) are actually elements of f(M). (Why?) That is, there 
exist*, y € M such that /(*) </(/)< /(y) for all r 6 M. (In this case we would 
write f(x) = min, €M f(t) and f(y) = max, €M /(f)) □ 


Corollary 8.6. Iff : [a, b ] -*• R is continuous, then the range of f is a compact 
interval [ c, d]for some c, d e R. 

Corollary 8.7. If M is a compact metric space, then ||/||oo = max, eM |/(f)| de- 
fines a norm on C(M), the vector space of continuous real-valued functions on M. 


EXERCISES 

Throughout, M denotes a metric space with metric d. 

> 20. Let £ be a noncompact subset of R. Find a continuous function / : E —*■ R 
that is (i) not bounded; (ii) bounded but has no maximum value. 

21. Prove Corollary 8.6. 

22. If M is compact and / : M -*■ N is continuous, prove that / is a closed map. 
t> 23. Suppose that M is compact and that / : M -*■ N is continuous, one-to-one, 

and onto. Prove that / is a homeomorphism. 

24. Let / : [ 0, 1 ] -* [ 0, 1 ] x [ 0, 1 ] be continuous and one-to-one. Show that / 
cannot be onto. Moreover, show that the range of / is nowhere dense in [ 0, 1 ] x 
[ 0, 1 ]. [Hint: The range of / is closed (why?); if it has nonempty interior, then it 
contains a closed rectangle. Argue that this rectangle is the image of some subinterval 
of [0, 1 ].] 

25. Let V be a normed vector space, and let * ^ y e V. Show that the map 
/(/) = * + t(y — x) is a homeomorphism from [ 0, 1 ] into V. The range of / is the 
line segment joining * and y; it is often written [ *, y ]. 

26. If/:R-»R is both continuous and open, show that / is strictly monotone. 

27. Given / : [a,b] R, define G : [a, b ] -► R 2 by G(x) = (jt ,/(*)) 
(the range of G is the graph of /). Prove that the following are equivalent: (i) / 
is continuous; (ii) G is continuous; (iii) the graph of / is a compact subset of R 2 . 
[Hint: / is continuous if, whenever x n — > x , there is a subsequence of ( f(x n )) that 
converges to f(x). Why?] 
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28. Let /:[<i, ft]-*- [a, h]be continuous. Show that / has a fixed point. Try to 
prove this without appealing to the intermediate value theorem. [Hint: Consider the 
function g(x ) = |* - /(x)|.] 

29. Let M be a compact metric space and suppose that f : M —*■ M satisfies 
d(f(x), f(y)) < d( x, y) whenever x ^ y. Show that / has a fixed point. [Hint: 
First note that / is continuous; next, consider g(x) = d(x, f(x)).] 


Corollary 8.7 would seem to suggest that compactness is the analogue of “finite” 
that we talked about at the end of Chapter Five. To better appreciate this, we will 
need a slightly more esoteric characterization of compactness. A bit of preliminary 
detail-checking will ease the transition. 

Lemma 8.8. In a metric space M, the following are equivalent: 

(a) IfG is any collection of open sets in M with [J{G : G € G) D M, then there 
are finitely many sets G \, . . . , G„ 6 Q with (J"_, G, D M. 

(b) If T is any collection of closed sets in M such that p|"=i F, ^ 0 for all 

choices of finitely many sets F\ F„ e T, then f]{F : F e !F] ^ 0. 

The proof of Lemma 8.8 is left as an exercise; as you might guess, De Morgan’s 
laws do all of the work. The first condition is usually paraphrased by saying, in less 
than perfect English, “every open cover has a finite subcover'' The second condition 
is abbreviated by saying “every collection of closed sets with the finite intersection 
property has nonempty intersection.” These may at first seem to be unwieldy statements 
to work with, but each is worth the trouble. Here’s why we care: Condition (a) implies 
that M is totally bounded because, for any e > 0, the collection Q = : x e M } 

is an open cover for M . Condition (b) implies that M is complete because it easily implies 
the nested set theorem (if Fi d F 2 D ■ ■ ■ are nonempty, then p|"=i F, = F„ ^ 0). Put 
the two together and we’ve got our new characterization of compactness. 

Theorem 8.9. M is compact if and only if it satisfies either ( hence both ) 8.8 (a) 
or 8.8 (b). 

proof. As noted above, conditions 8.8 (a) and 8.8 (b) imply that M is totally 
bounded and complete, hence compact. So we need to show that compactness 
will imply, say, 8.8 (a). To this end, suppose that M is compact, and suppose that 
G is an open cover for M that admits no finite subcover. We will work toward a 
contradiction. 

Now M is totally bounded, so M can be covered by finitely many closed sets 
of diameter at most 1. It follows that at least one of these, call it A\, cannot be 
covered by finitely many sets from G ■ Certainly A\ j^0 (since the empty set is 
easy to cover!). Note that A i must be infinite. 

Next, A i is totally bounded, so A \ can be covered by finitely many closed sets 
of diameter at most 1/2. At least one of these, call it Ai, cannot be covered by 
finitely many sets from Q. 
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Continuing, we get a decreasing sequence A\ D A 2 D • • O A n D • • • , where 
A n is closed, nonempty (infinite, actually), has diam/t n < 1/w, and cannot be 
covered by finitely many sets from Q. 

Now here’s the fly in the ointment! Let x e A n 0, because M is 
complete). Then, x e G € Q for some G (since Q is an open cover) and so, since 
G is open, x € Z? f (jt) c G for some e > 0. But for any n with 1/n <twe would 
then have x e A n c B e (x) c G. That is, A n is covered by a single set from Q. 
This is the contradiction that we were looking for. □ 

Just look at the tidy form that the nested set theorem takes on in a compact space: 

Corollary 8.10. M is compact if and only if every decreasing sequence of 
nonempty closed sets has nonempty intersection; that is, if and only if, whenever 
F\ D F 2 D -is a sequence of nonempty closed sets in M, we have p|^L, F n ^ 0. 

proof. The forward implication is clear from Theorem 8.9. So, suppose that 
every nested sequence of nonempty closed sets in M has nonempty intersection, 
and let (x n ) be a sequence in M. Then there is some point x in the nonempty set 
a 00 .! : * > n }.( Why?) It follows that some subsequence of (*„) must converge 

to X. □ 

Note that we no longer need to assume that the diameters of the sets F n tend to zero; 
hence, OT=\ ma y conta * n more than one point. 

Corollary 8.11. M is compact if and only if every countable open cover admits 
a finite subcover. (Why?) 


EXERCISES 

Except where noted , M is an arbitrary metric space with metric d. 

> 30. Prove Lemma 8.8. 

31. Given an arbitrary metric space M , show that a decreasing sequence of nonempty 
compact sets in M has nonempty intersection. 

32. Prove Corollary 8.1 1 by showing that the following two statements are equi- 
valent. 

(i) Every decreasing sequence of nonempty closed sets in M has nonempty inter- 
section. 

(ii) Every countable open cover of M admits a finite subcover; that is, if (G„) is a 
sequence of open sets in M satisfying IJ^li G n D Af, then (J^=i D M for 
some (finite) N. 

33. Let ( Af , d) be compact. Suppose that ( F n ) is a decreasing sequence of nonempty 
closed sets in Af, and that pl^! F„ is contained in some open set G. Show that F„ C G 
for all but finitely many n. 
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34. Let A be a subset of a metric space Af . Prove that A is closed in Af if and only 
if A H K is compact for every compact set K in Af . [Hint: If (jc„) converges to jc, 
then {jc} U {x n : n > 1} is compact. (Why?)] 

35. Let Q be an open cover for Af . We say that e > 0 is a Lebesgue number 
for Q if each subset of Af of diameter <e is contained in some G € Q. If Af is 
compact, show that every open cover of Af has a Lebesgue number. [Hint: If not, 
there exists a set E„ in Af with diam(£„) < \/n such that E n is not contained in 
any G € Q .] 

36. Let F and K be disjoint, nonempty subsets of a metric space Af with F 
closed and K compact. Show that d(F y K) = inf{*/(jc, y) : jc € F, y € K) > 0. 
Show that this may fail if we assume only that F and K are disjoint closed 
sets. 

37. A real-valued function / on a metric space Af is called lower semicontinuous 
if, for each real cr, the set {jc € Af : /(jc) > a} is open in Af. Prove that / is 
lower semicontinuous if and only if /(jc) < liminf^^oo /(jc„) whenever x n -» x 
in Af . 

38. If Af is compact, prove that every lower semicontinuous function on Af is 
bounded below and attains a minimum value. 

39. A function / : Af — ► R is called upper semicontinuous if — / is lower semi- 
continuous. Formulate the analogues of Exercises 37 and 38 for upper semicontinuous 
functions. 

40. Let Af be compact and let / : Af Af satisfy d(f(x ), / (y)) = d(x, y) for all 
x , y e Af . Show that / is onto. [Hint: If B e (x) fl /(Af ) = 0, consider the sequence 

(. f n (x ))•] 

41. Is compactness necessary in Exercise 40? That is, is it possible for a metric 
space to be isometric to a proper subset of itself? Explain. 

42. Let Af be compact and let / : Af — ► Af satisfy J(/(jc), /(>’)) > d(x , y) for 
all jc, y € Af . Prove that / is an isometry of Af onto itself. [Hint: First, given jc € Af , 
consider jc„ = f n (x). By passing to a subsequence, if necessary, we may suppose 
that (jc„) converges. Argue that jc„ -► jc. Next, given jc, y e Af , show that we must 
have d(f(; c), /(>’)) = d(x , y). Thus, / is an isometry into Af . Finally, argue that / 
has dense range.] 

43. Let Af be compact and suppose that / : Af — ► Af is one-to-one, onto, and 
satisfies d{f(; c), /(}’)) 5 d(x, y) for all jc, y € Af . Prove that / is an isometry of 
Af onto itself. [Hint: Exercise 42.] 


Uniform Continuity 


As it happens, continuous functions on compact spaces turn out to be more than simply 
continuous. To better appreciate this, let’s first consider an easy example: 
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Example 8.12 

The map / : (0, 1 ) -»■ R given by fix) = \/x is continuous. But / does not map 
nearby x to nearby /(x); for example, note that 


1 1 



/I 


n n + 1 

-► 0 

while 


M.+.) 


What’s going on? 

We cannot overlook the fact that continuity is a pointwise phenomenon; that is, 
/ : M -*■ N is continuous if it is continuous at each point x e M. And so, given e > 0, 
the & that “works” for one x may not work so well for another. That is, A typically 
depends on x too. A shorthand reminder will help explain the situation: 

Vx e A/ Ve>0 3S(x,c)>0 such that. . . 
we want to move this forward! 

The question is, can we find a S that does not depend on x? If so, / is called uniformly 
continuous, because a single S “works” uniformly for all x. 

Examples 8.13 

(a) A Lipschitz map / : R -*■ R is uniformly continuous. If / satisfies | fix) - 
fiy ) | < K\x - y\ for all x, y, then, given any e , the choice S = e/K always 
“works.” 

(b) Recall that | */x - yy | < y|x - y| holds for any x, y > 0. It follows that / (x) = 
y/x is uniformly continuous on [ 0, oo), because 5 = e 2 “works” for any e > 0. 
Note, however, that / is not Lipschitz on [ 0, oo), because -Jx/x = 1/,/x — ► oo 
as x — ► 0 + . 

It’s time we gave a formal definition: We say that / : (M , d) -*■ (N, p) is uniformly 
continuous if 

( for every e > 0 there is a S > 0 (which may depend on / and e) 
such that p( fix), fiy)) < e whenever x, y € M satisfy d(x, y) < S. 

We can easily change this to read: / is uniformly continuous if, given e > 0, there 
is a 5 > 0 such that / (fi/(x)) c Bfifix )) for any x e M. (Note that a uniformly 
continuous map is continuous - but not conversely.) Here’s a picturesque rephrasing of 
this definition: 

I f is uniformly continuous if (and only if), for every e > 0, there is a S > 0 
such that diam/yr /(A) < e whenever A c M satisfies diam M (A) < S. (Why?) 

It follows that a uniformly continuous map / sends Cauchy sequences into Cauchy 
sequences. (Why?) 
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EXERCISES 

Except where noted , Af is an arbitrary metric space with metric d. 

> 44. Show that any Lipschitz map / : (Af , d ) ( N , p) is uniformly continuous. 

In particular, any isometry is uniformly continuous. 

45. Prove that every map / : N -► R is uniformly continuous. 

46. Show that \d(x y z) — d(y, z)\ < d(x 9 y) and conclude that the map x 
d(x , z) is uniformly continuous on Af for each fixed z € Af . 

47. Given a nonempty subset A of Af , show that \d(x y A) — c/(y, A)| < d(x , y) 
and conclude that the map jc h* d(jc, A) is uniformly continuous on Af . 

> 48. Prove that a uniformly continuous map sends Cauchy sequences into Cauchy 
sequences. 

49. Show that the sum of uniformly continuous maps is uniformly continuous. Is 
the product of uniformly continuous maps always uniformly continuous? Explain. 

50. Iff is uniformly continuous on (0, 2) and on ( 1 , 3), is / uniformly continuous 
on (0, 3)? If / is uniformly continuous on [ n , n + 1 ] for every n € Z, is / necessarily 
uniformly continuous on R? Explain. 

51. If / : (0, 1 ) -> R is uniformly continuous, show that lim^o* /(; c) exists. Con- 
clude that / is bounded on (0, 1). 

52. Given and a € R, define F(x) = [/(*) — f(a)]/(x — a) for 

jc t £ a. Prove that / is differentiable at a if and only if F is uniformly continu- 
ous in some punctured neighborhood of a. 

53. Suppose that / : R R is continuous and that /(jc) — ► 0 as jc — ► ±oo. Prove 
that / is uniformly continuous. 

>54. Let £ be a bounded, noncompact subset of R. Show that there is a continuous 
function / : E R that is not uniformly continuous. 

> 55. Give an example of a bounded continuous map / : R -► R that is not uni- 
formly continuous. Can an unbounded continuous function / : R R be uniformly 
continuous? Explain. 

56. Prove that / : (Af , d ) (N y p) is uniformly continuous if and only if 
p(/(jc n ), f(y„)) 0 for any pair of sequences (jc„) and ( y n ) in Af satisfying 

d(x ny y n ) — ► 0. [Hint: For the backward implication, assume that / is not uniformly 
continuous and work toward a contradiction.] 

> 57. A function / : R — ► R is said to satisfy a Lipschitz condition of order a , where 
a > 0, if there is a constant K < oo such that |/(jc) — /(y)| < K\x — y| a for all 
jc, y. Prove that such a function is uniformly continuous. 

> 58. Show that any function / : R -> R having a bounded derivative is Lipschitz of 
order 1 . [Hint: Use the mean value theorem.] 

59. The Lipschitz condition is interesting only for a < 1 ; show that a function 
satisfying a Lipschitz condition of order a > 1 is constant. 
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60. Show that x a is uniformly continuous on (0, oo) if and only if 0 < a < 1. 
[Hint: For 0 < a < 1, show that x a is Lipschitz of order cr. Next, if a = 2, for 
example, notice that <Jn 4- 1 — yfn -► 0 as n oo. How does this help?] 

61. Two metric spaces (Af , d ) and (N, p) are said to be uniformly homeomorphic 
if there is a one-to-one and onto map / : Af — ► N such that both / and f~ l are 
uniformly continuous. In this case we say that / is a uniform homeomorphism. 
Prove that completeness is preserved by uniform homeomorphisms. 

62. Two metrics d and p on a set Af are said to be uniformly equivalent if the 
identity map between (Af , d ) and (Af , p) is uniformly continuous in both directions 
(i.e., if the identity map is a uniform homeomorphism). If there are constants 0 < c, 
C < oo such that cp(x, y) < d(x,y) < Cp(x y y) for every pair of points x y y e Af, 
prove that d and p are uniformly equivalent. 

63. Let d(x , y) = ||jc — y lb be the usual (Euclidean) metric on R 2 , and define a 
second metric p on R 2 by 

( . _ II* -ylh 

(i + ||*lll) ,/2 (i + drill ) l/2 

Show that d and p are equivalent but not uniformly equivalent. 

64. Show that the metric p = d/( \ + d) is always uniformly equivalent to d, but 
that there are examples in which the inequality cp < d < Cp may fail to hold (for 
all x, y). 


It follows from our earlier observations that a uniformly continuous function maps 
sets of small diameter into sets of small diameter. But even more is true: 

Proposition 8.14. If f : M -*■ N is uniformly continuous, then f maps totally 
bounded sets into totally bounded sets. 

proof. Let A c M be totally bounded and let e > 0. Since / is uniformly 
continuous, there is a S > 0 so that / (8 s d (x)) c Bf(f(x)) for any x 6 M. 
Next, since A is totally bounded, A c Ui=i #/(•*.) for some *i, . .. ,x„ e M. 
Combining these observations yields f(A) c U"=i (/(*>))• Hence, f(A) is 
totally bounded. □ 

We can push this further still. If the domain space M is compact, then every contin- 
uous function on M is actually uniformly continuous: 

Theorem 8.15. If M is a compact metric space, then every continuous map 
f : M —*■ N is uniformly continuous. 

proof. Let e > 0. For each x e M , let 5* > 0 be chosen such that p(/( jt), /(y)) 

< e whenever y satisfies d(x, y) < S x . If we should be so lucky as to have 
inf, S x > 0, then we are done. (Why?) Otherwise, we want to reduce to finitely 
many S x and take their minimum. 
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Now the collection { B S j 2 (x) : x € M } is an open cover for M and so there are 

finitely many points x i ** € M such that M c Uf=i B Vl (x,-), where r/,- =S Xl /2. 

This is the reduction to finitely many S x that we needed. Next we take the smallest 
one; set S = min{r?i r?*} > 0. We claim that this S “works” for 2e. 

Let x and y be in M with d( x, y) < S. Now x € B m (xi) for some /, so 

d(y, Xj) < d(y , x) + d(x, x ; ) < 8 + ru < 2 rn = S Xl . 

Thus, since we already have d(x, x,) < rj, < S Xl , we get 

p(f(x), f(y)) < p(f(x ), f(Xj)) + p(f(Xi), f(y)) <e + e = 2e. □ 

Theorem 8.15 is an important result, and so it might be enlightening to discuss two 
other proofs. The second (less direct) proof is based on Exercise 56. If / : M -*■ N is 
not uniformly continuous, then it follows from Exercise 56 that there are sequences (jc„) 
and ( y „ ) in M and some e > 0 such that d(x„, y„) -*■ 0 while p(f(x„), f(y„)) > e > 0 
for all n. (How?) If M is compact, though, we may assume that (x„) converges to a 
point x e M, by passing to a subsequence if necessary. The corresponding subsequence 
of ( y„ ) must also converge to x. That is, by relabeling, we may suppose that x„ -*■ x 
and y„ -*■ x. But then, assuming that we started with a continuous map /, we’d have 
f(x„) -*■ f(x) and f(y„) -» f(x) and, in particular, p(f(x„), f(y„)) -*■ 0, which is a 
contradiction. 

The third proof is “by picture.” Let’s first show that if / : [ a, b ] -*■ R is continuous, 
then / is uniformly continuous. To begin, let e > 0. We need to find a S > 0 such that 
if a pair of points x, y € [a,b] satisfy |/(x) - /(y)| > e, then x and y also satisfy 
|* - y| > 5. (Why?) In other words, we want to show that the function d(x, y) = |x — _y| 
is bounded away from 0 on the set E = {(x,y) € [a, b] x [a, b] : |/(x) — /(y)| > £}. 

The square [ a, ] x [ a, ] is pictured in Figure 8.1. The shaded regions form the set 
E. Note that E cannot hit the diagonal y = x because e > 0. (That is, d(x, y) = |x-y| is 
strictly positive on £.) The heart of the proof lies in the observation that E is compact, 
and so it must be strictly separated from the diagonal by some positive distance. 

Now since / is continuous, it follows that £ is a closed subset of[a,h] x [a,b] 
(a compact metric space), and hence is compact. This is easy enough to check by using 
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a sequential argument, but instead consider this: The function g(x , y) = |/(jc) — /(y)| 
is a continuous function on [a,b]x[a,b], and so E = g _l ([e, oo)) is closed. Finally, 
since the function d(x, y) = |;r — y \ is continuous (and strictly positive) on E, it follows 
that d attains a minimum value S > 0 on E. 

It is easy to modify this proof to work in the general case of a continuous function 
/ : (M , d) -*■ (N, p) on a compact space M. Essentially repeat this proof, using d( x, y) 
in place of |jc - y| and p(f(x), f(y)) in place of 1 /(jc) - f(y)\. The proof that the 
corresponding set £ is a closed subset of the compact space M x M is the same. The 
details are left as an exercise. 

Uniform continuity is often useful for finding extensions of continuous functions. 
Here is a variation on Theorem 7.18 that explains how this is done (you might want to 
recall the proof of Theorem 7. 1 8 before reading on). 

Theorem 8.16. Let D be dense in M, let N be complete, and let f : D -*• N be 
uniformly continuous. Then, f extends uniquely to a uniformly continuous map 
F : M -*■ N, defined on all of M. Moreover, if f is an isometry, then so is the 
extension F. 

proof. First notice that uniqueness is obvious, because D is dense. That is, any 
two continuous functions g, h : M -*• N that agree on D must actually agree on 
all of M. Existence is the tough part. 

We define F : M -*■ N as follows (this is nearly the same scheme that we 
used in the proof of Theorem 7.18): Given x e M, there is a sequence ( x„ ) in D 
such that jr„ -*■ x in M, since D is dense in M. Now (jt„) is Cauchy in D, and 
hence (/(*„)) is Cauchy in N, because / is uniformly continuous. Thus, since N 
is complete, f{x„) -*■ y for some y e N. Set F(x) = y. In brief, if x = lim„_ 00 x„, 
where (x„) is in D, then set F(x) = lim,,-,,*) f(x„) in N. 

First let’s check that F is well defined. If (x„) and (z„) are two sequences in D 
with x„ -*■ x and z„ -*• x, then the sequence jt|, zi, * 2 , Z 2 , ■■ ■ also converges to 
x. Thus, /(jci), /(z 1 ), f{xf), f(z 2 ), . . . converges to some y e N (as above). But 
then we must have /(*„) -*■ y and f{z„) -*■ y. (Why?) 

The fact that £ is an extension of /, that is, that F | D = /, is obvious because 
/ is continuous (besides, we get to use constant sequences). 

Next we’ll check that F is uniformly continuous. (Watch the e’s and S' s care- 
fully here!) Let e > 0, and choose S > 0 so that p(f(x'), f(y')) < e whenever 
x\ y' € D with d(x', y') < S. We claim that S/3 “works” for 3e and F. To see 
this it will help matters if we first make an observation: Given x e M, there is 
an x' € D such that d(x,x') < S/3 and p(F(x), f(x')) < e. (Why? Because if 
x„ -+ x , where x„ e D, then /(*„) -*• F(x).) 

The rest is easy. Given x, y e M with d(x, y) < S/3, choose x', y' 6 D 
(as above) such that d(x, jt') < S/3, d(y, /) < S/3, p(F(x), f(x')) < e, and 
P(F(y), /(y')) < e. But then d(*', y') < d(x', x)+d(x, y)+d(y, y') < S, and hence 

p(F(x), F(y)) < p(F(x :). f(x')) + p(f(x'), /(>•')) + p(f(y'), F(y)) 

< e + e + e = 3e. 
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Finally, note that if / is an isometry, then so is F. Given x,y e M, choose (*„) 
and (y„) in D with x„ -*■ x and y„ -*■ y. Then 

d(x, y) = lim d(x n , y„) = lim p(/( x„), f(y„)) = p(F(x), F(y)). □ 

n-+ oo n-*o o 


Corollary 8.17* Completions are unique (up to isometry ). That is, if M\ and M 2 
are completions of M, then M\ and Mi are isometric . 


EXERCISES 

Throughout , M denotes a generic metric space with metric d. 

65. If / : (0. 1) — ► R is continuous, and if both /(0+) and /( 1— ) exist, show 
that the function F defined by F(0) = /(0+), F(l) = /(1-), and F(x) = f(x) 
for 0 < x < 1 is uniformly continuous on [ 0, 1 ]. 

66. If / : (0, 1) — ► R is uniformly continuous, show that lim,_*o+/(*) exists. 
Conclude that / is bounded on (0, 1). 

67. Define / :t 2 -* t\ by f(x) = ,* Show that / is uniformly conti- 

nuous. 

68. Fix y e too and define g : t \ -* t\ by g(x) = Show that g is uni- 

formly continuous. 

69. Prove Theorem 8.15 by supplying the details to the “proof by picture” in the 
general case. 

70. Let K = [x e too : lim jc„ = 1}. Prove: 

(a) AT is a closed (and hence complete) subset of too • 

(b) If T : too -► too is given by T(x) = (0, X\,x 2 , . . .) for jc = (x\, x 2 , . . .) in 
tooy that is, if T shifts the entries forward and puts 0 in the empty slot, then 
T(K)C K . 

(c) T is an isometry on AT, but T has no fixed point in K. 

71. If A is dense in M , show that A and M have the same completion (isometrically). 

72. Let D be dense in M . Show that M is isometric to a subset of too(D). [Hint: 
First embed D into too(D) and then apply Theorem 8.16.] In particular, every 
separable metric space is isometric to a subset of too • (But too is not separable. 
Why?) 


Equivalent Metrics 


As a last topic related to both compactness and uniform continuity, we discuss several 
notions of equivalence for metrics (and norms). Throughout, we will suppose that d 
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and p are two metrics on the same set M. We will write i : (A/, d) -*■ ( M , p) as the 
identity map and » -1 : (A/, p) -*■ (M, d ) as its inverse (also the identity map, but in the 
other direction). 

We say that d and p are equivalent if both i and i -1 are continuous (that is, if i is a 
homeomorphism), and we say that d and pare uniformly equivalent if/ and/ -1 are both 
uniformly continuous (that is, if i is a uniform homeomorphism). Finally, we say that d 
and pare strongly equivalent if both/ and/ -1 are Lipschitz. That is, d and pare strongly 
equivalent if there exist constants 0 < c,C < oo such that cp(x, y ) < d(x, y) < Cp(x, y) 
for all jc, y € M. (Some authors would state this requirement by saying that i is a 
lipeomorphism.) Actually, many authors take strong equivalence as their definition of 
simple equivalence, but, as we shall see, there are some differences between the three 
definitions. In any case, it is easy to see that 

strongly equivalent ==> uniformly equivalent ==>• equivalent. 

In this section we will see that neither of these implications will reverse, in general, 
without some additional hypothesis. 

Example 8.18 

Consider d(x, y) = I* — y| and p(;r, y) = Vk — yl on M = [0, 1 ]. Then, d 
and p are equivalent. (Recall Exercise 3.42. In fact, d and p are even uniformly 
equivalent - why?) However, c */]x - y| < |jc - y| cannot hold for any c > 0 
(and all x, y). That is, d and p are not strongly equivalent. Here’s why: Replace 
|x — y | by t and suppose that c */T < t for some c > 0 and all 0 < / < 1 . Then, by 
dividing, we would have c < ■/? for all 0 < t < 1, which is clearly impossible 
(since -JT 0 as t -*■ 0 + ). 


EXERCISES 

73 . Given any metric space (Af , d ), show that the metric p = d/( 1 + d) is always 
uniformly equivalent to d but that there are cases in which the inequality d < Cp 
may fail to hold. 

74 . Let d( x, y) = ||jt — y ||2 be the usual (Euclidean) metric on R 2 , and define a 
second metric p on R 2 by 

, . II* - yh 

pX ’ y 0 + l*l| ) ,/2 (i + \\y\\l ) ,/2 ‘ 

Show that d and p are equivalent but not uniformly equivalent. 


It is easy to imagine at least one case where equivalence and uniform equivalence 
should coincide. If (A/, d) is compact, then every continuous map on M is actually 
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uniformly continuous, and so equivalence and uniform equivalence might very well be 
one and the same. And so they are. 

Proposition 8.19. Suppose that ( M , d ) is compact and that p is another metric 
on M. Then d and p are equivalent if and only if d and p are uniformly equivalent. 

proof. The identity map / : ( M,d ) — ► (M, p) is continuous and onto; hence 
i is uniformly continuous and (A/, p) is compact. Now, by applying the same 
reasoning to i -1 , it follows that / is uniformly continuous. □ 

In spite of the fact that the three notions of equivalence are different, in general, we 
will establish the rather surprising fact that all three coincide when applied to norms 
on any vector space. To see this, we will first need to collect a few preliminary results 
about linear maps between normed vector spaces, each of which is interesting in its 
own right. In particular, for a linear map, we will show that continuity at a single point 
automatically gives us uniform continuity (and even more). 

For the next several results, we suppose that (V, || • || ) and (W, ||| • |||) are normed 
vector spaces and that T : V -* W is a linear map. That is, T is a vector space 
homeomorphism. This means that T “respects” vector space operations in the sense 
that T(ax + fiy) = aT{x) + PT(y) for any x, y 6 V and any scalars or, 0 € R. In 
particular, a linear map always satisfies T (0) = 0. 

Theorem 8.20. Let (V, || • || ) and (W, ||| • |||) be normed vector spaces, and let 
T : V -*■ W be a linear map. Then the following are equivalent: 

(i) T is Lipschitz; 

(ii) T is uniformly continuous; 

(iii) T is continuous ( everywhere ); 

(iv) T is continuous at 0 e V; 

(v) there is a constant C < oo such that ||| 7’(jt)||| < C|k || for all x € V. 

proof. Clearly, (i) (ii) => (iii) ==> (iv). We need to show that (iv) => (v) 
and that (v) => (i) (for example). The second of these is easier, so let’s start there. 

(v) => (i): If condition (v) holds for a linear map T, then T is Lipschitz (with 
constant C) because ||| T(x)- T(y) ||| = ||| T(x — y) ||| < C||jr-y|| foranyx.y € V. 

(iv) =» (v): Suppose that T is continuous at 0. Then we may choose a 8 > 0 
so that||| TWIII = III T(x)- 7X0) IN < 1 whenever ||jc|| = ||jr -0|| < S. 

Given 0 ^ x e V, we scale by the factor 5/ Ik II to get || S*/||;c|| || = 8. Hence, 

HI 7’(5^/||jc||)||| < 1. But T(Sjc/|k||) = (<5 /||jc||)7TjO, because T is linear, and so 
we get |||7'(jr)||| < (l/3)||x||. That is, C = 1/5 works in condition (v). (Since 
condition (v) is trivial for x = 0, we only care about the case in which x 0.) □ 

A linear map satisfying condition (v) of Theorem 8.20 (i.e., a continuous linear map) 
is often said to be bounded. The meaning of bounded in this context is slightly different 
than usual; here it means that T maps bounded sets to bounded sets. This follows from 
the fact that T is Lipschitz. Indeed, if ||| 7 ™ (jc) ||| < C||x|| for all x e V, then (as we saw 
earlier)||| T'(jc) — 7’(y)|||< C\\x - y|| for any x, y e V, and hence T maps the ball about 



Equivalent Metrics 


123 


x of radius r into the ball about T(x) of radius Cr. In symbols, T (B r (x)) c Bc r ( T (x)). 
More generally, T maps a set of diameter d into a set of diameter at most Cd. There is 
no danger of confusion in our using the word bounded to mean something new here; 
the ordinary usage of the word (as applied to functions) is uninteresting for linear maps. 
A nonzero linear map always has an unbounded range. (Why?) 

Given normed vector spaces ( V, || • || ) and (W, ||| • |||), the collection of all bounded 
linear maps T : V -*■ W is itself a vector space under the usual pointwise operations 
on functions. That is, if S, T : V -*■ W are continuous, linear maps, and if or, 0 € R, 
then the map aS + 0T : V W, defined by 

(aS + pT)(x) = ctS(x) + pT(x), x e V, 

is again linear and continuous. The collection of all continuous, linear maps from V 
into W will be denoted by B(V, W), where B stands for “bounded.” 

Theorem 8.20 provides a natural candidate for a norm on B(V, IV). If T : V -*■ W 
is continuous and linear, we define the norm of T to be the smallest constant C that 
“works” in Theorem 8.20 (v). Thus, the norm of T is given by 

II 71 = inf[C:|||rx|||<C||x||forallx€ V] = sup 

t #0 11*11 

That is, || 71 satisfies|||rx||| < ||7*|| ||x|| for all x e V, and || 7*|| is the smallest constant 
satisfying this inequality for all x e V. The proof that this new expression, called the 
operator norm, actually is a norm on B( V, WO is left as an exercise. 


EXERCISES 

75. Suppose that / : R -» R satisfies f(x + y) = f(x) + f(y) for every x, 
y e R. If / is continuous at a point xo € R, prove that there is some constant a 6 R 
such that / (x) = ax for all x e R. That is, an additive function that is continuous at 
even one point is linear - and hence continuous on all of R. 

76. Fix y 6 R" and define a linear map L : R" — ► R by L(x) = (x, y). Show that 
L is continuous and compute ||L|| = sup A ^ 0 |L(x)|/||x|| 2 - [Hint: Cauchy-Schwarz!] 

77. Fix k > 1 and define / : i & -*■ R by /(x) = x*. Show that / is linear and 
has ll/H = 1. 

78. Define a linear map / : tz — > i\ by /(x) = (x„/rt)£i,. Is / bounded? If so, 
what is || /||? 

79. If S, T € B(V, W), show that 5 + T € B(V, W ) and that ||5 + 7|| < ||5 || + 
||T || . Using this, complete the proof that B(V . W) is a normed space under the 
operator norm. 

80. Show that the definite integral 1(f) = (t)dt is continuous from C[a,b\ 
into R. What is || / 1|? 

81. Prove that the indefinite integral, defined by T (/)(x) = f(t)dt, is continu- 

ous as a map from C[a,b] into C[a, b ]. Estimate ||T||. 

82. For T € B(V, W), prove that ||T|| = sup(|||7x ||| : ||x|| = 1). 
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83. If V is any normed vector space, show that B(V, R) is always complete. [Hint: 
Use Banach’s characterization, Theorem 7.12.] 

84. Prove that B( V, W) is complete whenever W is complete. 


Theorem 8.20, besides being merely spectacular, does even more for us: It supplies 
the proof that “equivalent” and “strongly equivalent” coincide for norms. (Recall that 
two norms are said to be equivalent if the metrics that they induce are equivalent. The 
same goes for strongly equivalent.) 

Corollary 8.21. Let || • || and ||| • ||| be two norms on a vector space V. Then, 

|| • || and HI • \\\are equivalent if and only if there are constants 0 < c, C < oo such 
that c ||jc || < 111*111 < C|| * || for every * 6 V. 

proof. The key here is that both the identity map i : (V, || • || ) -> (V, ||| • |||) and 
its inverse i~‘ are linear. Now, || ■ || and ||| ■ ||| are equivalent if and only if both / 
and x _l are continuous. By Theorem 8.20, i and i -1 are continuous if and only 
if there exist constants 0 < c, C < oo such that|||*||| < C||*|| and ||*|| < c -1 |||*||| 
for all * e V. (Why?) □ 

Once again, if we bring compactness into the picture, we can say even more. We 
will use the fact that closed balls in R" are compact to prove: 

Theorem 8.22. Any two norms on a finite-dimensional vector space are equiva- 
lent. 

proof. Let V be an n-dimensional vector space with basis * *„. We will 

define a specific, convenient norm on V and prove that any other norm on V is 
equivalent to ours. To do this, it will help if we first recall a simple fact from linear 
algebra. 

Algebraically, V is just R" in disguise. Each * e V can be uniquely writ- 
ten as * = ]r" =1 or,*, , for some scalars oi, . . . ,or„ 6 R. Thus we may think of 

* as the n-tuple (oti a„) € R". That is, the basis-to-basis map */ t-+ e, = 

(0, .... 0, 1, 0, .... 0) (the usual basis in R n ) is a vector space isomorphism be- 
tween V and R". 

Given this, we can easily define a norm on V by “borrowing” a norm from R B . 
Specifically, let 

|j>*| = y>< = ||x>.j| 

for each * = ]T"=i a,*, e V. Since *),...,*„ is a basis, this clearly defines a 
norm on V : 

||*|| = 0 <=> or, = 0 for all i * = 0. 


Moreover, the basis-to-basis map is a linear isometry between (V, || • || ) and 

(RMI-lh). 
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Here is what we need out of all of this: The unit sphere 5 = {* € V : ||x|| = 1} 
is compact in ( V, || • || ) because the corresponding set in R" is compact. (Why?) 
Now we can start the proof of the theorem! 

Suppose that |||-||| is any other norm on V. Then, for x = £" = , a,*, , we have 

£<*,*.• ^X>iiiwii 

i=i HI i=i 

< (^maxll^lllj ^|a, | 

= C||jc H , where C = max |||x.j||. 

l<j<n 

That is, ||(jr|||< C||jc || for every x e V. 

For the other inequality we will need to use our observation about the unit 
sphere S. The inequality that we have just proved tells us that ||| • ||| is a continuous 
function on (V, || • || ). Indeed, | |||x||| - |||y||| | <\\\x - y||| < C||x - y|| for any 
x, y e V. But then, ||| • ||| is also continuous on 5, and so ||| • ||| must assume a 
minimum value on S, say c € R. That is, ||lx||| > c whenever ||jc || = 1. Since 
this minimum is actually attained, we must also have c > 0. (Why?) Now we’re 
cooking! Given 0 ^ x e V we have x/||jc|| e 5, and hence ||| jc/||jc|| ||| > c. That 

is,||W||>c||*||. □ 

The fact that all norms on a finite-dimensional normed space are equivalent elevates 
the merely spectacular to the simply phenomenal: 

Corollary 8.23. Let V and W be normed vector spaces with V finite-dimensional. 
Then, every linear map T : V —> W is continuous. 

proof. Let* be a basis for V and let || £" = , a,jc, || = £"=, |a,j, as above. 

We may assume that this is “the” norm on V, since, by Theorem 8.22, every norm 
produces the same continuous functions on V. 

Now if T : (V, ! • || ) -> ( W , HI • HI) is linear, we get 

T (E.,)|| = | ]^0tjT(Xi) 

n 

< J]|a i ||||r(x,)||| 

1=1 

< ^max|||r(x y )|||)^|a ( |. 

That is,|||r(x)|||<C||x||, where C = max^^^HI r(x;)|||. By Theorem 8.20, T is 
continuous. □ 

Corollary 8.23 allows us to clean up a detail left over from Chapter Five: 

Corollary 8.24. Any two finite-dimensional normed vector spaces of the same 
dimension are uniformly homeomorphic. In fact, we can even find a linear ( and 
hence Lipschitz) homeomorphism between them. 
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Corollary 8.25. Every finite-dimensional normed vector space is complete. 
(Why?) 

Corollary 8.26. A finite-dimensional linear subspace of any normed vector space 
is always closed. (Why?) 


EXERCISES 


85. Fill in the missing details in the proof of Theorem 8.22. 

86. If ( V, || • || ) is an w -dimensional normed vector space, show that there is a norm 
III • HI on R n such that (R n , ||| • |||) is linearly isometric to (V, || • || ). 


87. Prove Corollary 8.24. 


88. Prove Corollary 8.25. 


89. Corollary 8.26 is of interest because an infinite-dimensional normed space may 
have nonclosed subspaces. For example, show that [x € 1 1 : x n = 0 for all but finitely 
many n } is a proper dense linear subspace of i i . 


0 


Notes and Remarks 

The classical definition of compactness, due to Frgchet, is the statement of Theorem 8.2: 
Each sequence has a convergent subsequence. But early usages of the word “compact” 
often referred to what we have called precompact sets - sets whose closures are compact. 
In effect, then, the Bolzano-Weierstrass theorem characterizes the bounded sets as the 
precompact subsets of R. Hausdorff first proved the theorem that we have taken as our 
starting point: A space is compact if and only if it is complete and totally bounded. 

The property described in Lemma 8.8 (a) is generally taken as the formal definition 
of compactness for topological spaces, due to Alexandrov and Urysohn [1924] (who 
used the word “bicompact” in describing such spaces). It has as its basis the so-called 
Heine-Borel or Borel-Lebesgue theorems (a covering of a closed, bounded interval by 
open sets has a finite subcover). Riesz [1908] added the finite intersection property to 
the list for subsets of R", while the general case is due to Sierpinski [1918]. For more on 
the early history of Theorem 8.9, see Dudley [1989], Manheim [1964], Temple [1981], 
Willard [1970], and the award-winning article by Hildebrandt [1926] (reprinted in 
Abbott [1978]). The property described in Theorem 8.2 is called sequential compact- 
ness , while the property described in Corollary 8.11 is called countable compactness. 
In a metric space, each of these coincides with the formal definition of compactness, 
but this is not always the case in more general topological spaces. 

Corollary 8. 1 1 is due to Fr&het. For more on Exercise 27, see Apostol [ 1 975], Buck 
[1967], and Thurston [1989]. Exercises 29 and 40-43 are taken from Kaplansky [1977]. 
For more on the results stated in Exercises 28 and 29, see D. F. Bailey [1989] (and its 
bibliography), and Bennett and Fisher [1974]. Semicontinuity (Exercises 37-39) was 
introduced by Baire [1899], See Rad6 [1942] for more details. 
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For a survey of applications of compactness in analysis, see Hewitt [I960]. For a 
simplified treatment of the classical theorems presented in this chapter in the case of a 
closed bounded interval [a, b], see Botsko [1987], Barnsley [1988] and Edgar [1990], 
on the other hand, illustrate certain “modem” applications of compactness. 

Exercise 70 is adapted from an exercise in Hoffman [1975]. It would seem that Heine 
was the first to define uniform continuity for real-valued functions; he used it to prove 
Theorem 8.15 for real-valued functions defined on a closed bounded interval [a, b\. 
According to Dudley [1989], Heine gave a great deal of credit to unpublished lectures 
of Weierstrass. The metric space definition is due to Frechet and Hausdorff. The clever 
“proof by picture” for Theorem 8.15 is taken from the article by D. M. Bloom [1989]. 
Several authors have considered the problem of characterizing those spaces for which 
all continuous maps are uniformly continuous; see, for example. Beer [1988], Chaves 
[1985], Hueber [1981], Levine [1960], and Snipes [1984], 

The discussion of equivalence, strong equivalence, and uniform equivalence for 
metrics is based in part on the presentation in Kuller [1969]. Maddox [1989] gives an 
elementary computation of the norm of a linear map on C[ a, b ] defined by an integral, 
as in Exercises 80 and 81 . 

Analysis in infinite-dimensional normed vector spaces is vastly different from the 
finite-dimensional case. To fully appreciate the extent of the difference is beyond our 
means just now, but we can at least indicate a few reasons. For one, recall that 5 = {* e 
(■2 : ||* lb = 1], the unit sphere in ii, is not compact. (Remember the e„?) Thus, the 
proofs of Theorem 8.22 and Corollary 8.23 fall apart in € 2 - But the same would be true of 
any infinite-dimensional space. In fact, it turns out that a normed linear space (V, || • || ) is 
finite-dimensional if and only if its closed unit ball B = [x 6 V : ||x|| < 1} is compact. 
Moreover, (V, || • || ) is infinite-dimensional if and only if there exists a discontinuous 
linear map T : V -*■ R if and only if V contains a proper dense subspace. On the other 
hand, Corollary 8.24 can be at least partially salvaged: Anderson [1962] has shown 
that all separable, infinite-dimensional Banach spaces are (mutually) homeomorphic. 
We cannot hope for uniformly homeomorphic here since, for example, it is known that 
t p and l q are not uniformly homeomorphic for any 1 < p < q < 00 . For much more 
on this, see the note by Bessaga and Petczyriski [1987] in the English translation of 
Banach’s book. 
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Discontinuous Functions 


We have had a lot to say so far about continuous functions, but what about discontinuous 
functions? Is there anything meaningful we might say about them? In order that we 
might ask more precise questions, let’s fix our notation. Throughout this section, we 
will be concerned with a function / : R -> R, and we will write D(f) for the set of 
points at which / is discontinuous. The questions are: What can we say about £>(/)? 
What kind of set is it? Can any set be realized as the set of discontinuities of a function, 
or does D(f ) have some distinguishing characteristics? To get us started, let’s recall a 
few examples. 


Examples 9.1 

(a) If / is monotone, then D(f) is countable. Conversely, any countable set is the 
set of discontinuities for some monotone / (see Exercise 2.34). 

(b) There are examples of functions /, g with D(f) = Q and D(g) = R. (What are 
they?) 


In particular, we might ask whether D(f) can be a proper, uncountable subset of R. 
For example, is there an / with D(f) = R \ Q? or with D(f) = A? The answer to the 
first question is: No, and to the second: Yes, but to understand this will require a bit of 
machinery. 

The first thing we need is a detailed description of £>(/). For this we will simply 
negate the definition of the statement “/ is continuous at a ”: 

| there exists an e > 0 such that, given any S > 0, 
ae * ' { we have |/(jc) — /(a)| > e for some x with |x — a\ < S. 

What this means is that, given any bounded, open interval / containing a, we always 
have sup{|/(x) — /(y)| : x, y € / } > e. (Why?) This supremum has a geometric 
description (which is why we want to use it); indeed, notice that 

sup \f(x) — f(y)\ = diam /(/). 

x.yel 

We will write our description of D(f) in terms of this supremum, but first we will give 
it a name. Given a bounded interval /, we define co(f ; /), the oscillation of / on I, by 
<o(/; /) = sup{|/(x) - /(y)| : x, y 6 / ). Note that 0 < coif ; I) < 2sup xel |/(*)|. Of 
course, if / is unbounded on /, we set co(f ; /) = oo. 
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Also notice that <o(f ; I) decreases as I decreases; that is, if J c /, then co(f ; J) < 
coif; I). Consequently, if / is bounded in some neighborhood of a, and if we consider 
intervals that “shrink” to a, then the oscillations over those intervals will decrease to a 
fixed (finite) number. These observations allow us to define the oscillation of / at a, 
written a>/(a), by 

o)e (a) = inf (o(f;I)= lim co(f;(a — h,a + h)) = lim diam/(Bft(a», 

1 3a h-*0* A— ►O* 

/open 

where the notation / 9 a is intended as a reminder that the infimum is over bounded 
(open) intervals I containing a. If / is unbounded in every neighborhood of a, we 
set <Of {a) = oo. We have insisted on open intervals in the definition of cof(a) to be 
consistent with the characterization of discontinuity at a that we gave earlier. 

The oscillation of / at a is rather like the “jump” in the graph of / at a (if any). For 
example, if / is increasing, then (o f (a) — f(a+) — f(a-). In any case, we always have 
(Of (a) > 0, and our earlier discussion tells us that a e D(f) if and only if a>f(a) > 0. 
That is, / is continuous at a if and only if <Of(a) = 0. (Why?) 

Now we are ready to give a more detailed description of D(f). 

Theorem 9.2. Iff : R R, then D{f) is the countable union of closed sets in R. 

proof. First, let’s write D(f) as a countable union: 

D(f) = { a : co f (a) > 0} 

= { a : (Of{a) > e for some e > 0} 

OO 

= l> : (Of (a) > \/n] (Why?) 

n=l 

Thus, we need to show that a set of the form [a : a) f (a ) > r } is closed, where 
r > 0 is fixed. Equivalently, we might show that the set [a : cof(a) < r) is open, 
and this is easy. If x Q e [a : <o f (a) < r}, that is, if a> f (x o) < r, then there is some 
bounded open interval I containing jto such that co(f ; /) < r. (Why?) It follows 
that I C [a : <Of(a) < r}, since co/(x) < <o(f ; /) < r for any x e /. □ 


EXERCISES 

1. If / is increasing, show that (Of(a) = f(a+) — f(a—). 

2. Prove that / is continuous at a if and only if (Of (a) = 0. 

3. Given / : R -> R, show that g(x) = arctan f(x) satisfies D(g) = D(f). 
Thus, in any discussion of D(/), we may assume that / is bounded. 

> 4. Let / : [ a, b ) — > R be continuous, and let e > 0. Show that there is an n e N 
such that a >(/; [ ( k — 1 )/n, k/n ]) < e for all k = 1 , . . . , n. 

>5. If A is a subset of R and if x is in the interior of A, show that x is a point of 
continuity for X* (the characteristic function of A). Are there any other points of 
continuity? 
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6. Compute D(X A ), where A is the Cantor set. If £ is the set of all endpoints in A 
(see Exercise 2.23), compute D(X A \e). 

7. For which sets A is X A upper semicontinuous? lower semicontinuous? 

8. Given any bounded function /, show that the function to /(; t) is upper semicon- 
tinuous. 

9. If E is a closed set in R, show that E = D(f) for some bounded function /. 
[Hint: A sum of two characteristic functions will do the trick.) 

10. Is every bounded continuous function on R uniformly continuous? 


Our earlier questions about the nature of D{f) can now be rephrased: Which subsets 
of R can be written as a countable union of closed sets? In particular, is R \ Q such a 
set? Conversely, is every countable union of closed sets the set of discontinuities for 
some bounded function? Before we answer these questions, it might be helpful to have 
a name for countable unions of closed sets (and the like). 

A countable union of closed sets is called an F„ set. Thus, the set of discontinuities 
D(f) is an F„ set. We might want to turn things around by taking complements, and so 
we also name a countable intersection of open sets; these are called Gs sets. The letter 
F stands for ferine, or closed, while a stands for somme, or sum. The letter G stands 
for Gebiet, or region - besides, it comes after F - while <5 stands for Durchschnitt, or 
intersection. This is proof positive that both a Frenchman and a German had a say in 
our notation! 

The letters 8 and a represent operations performed on the underlying class of closed 
sets F or on the class of open sets G . The result is often a new class of sets. For example, 
note that we would get nothing new by considering Ft sets because the intersection of 
closed sets is again closed. In other words, F s = F. The same goes for G„ sets. But we do 
get something new by considering F„ ’s and Gs' s. The set of rationals Q, for instance, is 
an F a set, but it is obviously neither open nor closed. By taking complements, the set of 
irrationals R \ Q is a G t set. We can continue this process - any combination producing 
something new is of interest - and consider, say, F„s sets (countable intersections of 
F a sets), Gsa sets (countable unions of Gs sets), and so on. 


EXERCISES 

11. Show that every open interval (and hence every nonempty open set) in R is a 
countable union of closed intervals, and that every closed interval in R is a countable 
intersection of open intervals. 

12. More generally, in any metric space, show that every open set is an F„ and that 
every closed set is a Gs- 

13. If £ is an F a set in R, is £ = D(f) for some /? (The answer is yes, but this 
is hard!) 
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The Baire Category Theorem 

Recall that we have rephrased our earlier question about sets of discontinuity to read: 
Which subsets of R can be written as countable unions of closed sets? In particular, 
we asked whether R \ Q was such a set. Obviously, we can turn things around and ask 
whether Q is a countable intersection of open sets. Now any open set containing Q is 
dense in R, so we might first ask whether the countable intersection of dense open sets 
is still dense. The answer is yes: 

The Baire Category Theorem for R 9.3. If (G„) is a sequence of dense, open 
sets in R, then f'|~ , G„ # 0. In fact, P£L| G„ is dense in R. 

proof. Let *0 € R, and let Io be any open interval containing xo. We will prove 
both conclusions at once by showing that Io n (p^l, G„) / 0. 

Since G\ is dense, we know that Iq n Gj #0. But since G \ is also open, 
this means that we can find some open interval I\ c Io n G\. By shrinking I\ (if 
necessary), we may suppose that diam(/|) < 1 and / |C/»nG|. 

Now use I] in place of / 0 and G 2 in place of G\. Since G 2 is dense, we have 
/, H G 2 ^ 0. But G 2 is open, so there is some open interval / 2 with diam(/ 2 ) < 1 /2 
such that / 2 C I\ n C 2 c Io n G\ n G 2 . 

Repeat this using / 2 and G3 in place of !\ and G 2 , and so on. What we get is 
a sequence of nested closed intervals, i\ D / 2 D • • • with diam(/„) < \/n and 
/„ C Io n (Dr., G*). Thus, by the nested interval theorem, I 0 n (Htli G*) D 
n~. Consequently, G„ is nonempty and dense. □ 

Note that Baire’s theorem provides a new proof that R is uncountable. Indeed, if 
R = {xi , jc 2 , . . .}, then each of the sets G„ = R \ {jc„ } is open and dense (see Exercise 15); 
but they also satisfy G„ = 0, which contradicts Baire’s theorem. 

We can push this observation a bit further. A dense G« subset of R must also be 
an uncountable set. Here’s why: If (G„) is a sequence of open dense sets in R and if 
G„ = {jc, , jc 2 , . . .}, then the sets G„ = G„\ {x„} are still open and dense, but 
a°°=, G n = 0, contrary to Baire’s theorem. Thus, H^Li G„ is uncountable. This is the 
extra piece of information that we need to settle our original questions. 

Corollary 9.4. Q cannot be written as the countable intersection of open subsets 
of R. 

Corollary 9.5. R \ Q ^ D(f) for any / : R -*■ R. 

By rephrasing Baire’s theorem, we will be able to see another reason behind these 
last two corollaries. 

Corollary 9.6. If R = (J^l, where each E„ is closed, then some E„ contains 
an open interval. 

proof. Each of the sets G„ = R \ E„ is open in R and G„ = 0. Thus, by 
Baire’s theorem, some G„ is not dense. That is, some G„ misses an entire open 
interval. In other words, some E„ contains an interval. □ 
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Corollary 9.7. IfR — E„, then the closure of some E„ contains an interval; 

that is, int(£„) / 0 for some n. (Why?) 

Corollary 9.8. If R \ Q = U«li then the closure of some E„ contains an 
interval. 

How very different R \ <Q» and Q are! The rationals are somehow very “sparse” while 
the irrationals are quite “thick.” To appreciate this difference, and to generalize Baire’s 
theorem to metric spaces, will require some new terminology. To begin, recall that a 
subset £ of a metric space M is called nowhere dense in M if £ contains no nonempty 
open set, that is, if the interior of £ (in M) is empty. Judicious rewriting of this condition 
might help. Note that £ is nowhere dense if and only if £ is nowhere dense (obviously), 
and that £ is nowhere dense if and only if the complement of £ is dense (since every 
open set has to hit (£ ) c ). Consequently, £ is nowhere dense in M if and only if the 
complement of £ is an open, dense set in M. 

Examples 9.9 

(a) N and A are nowhere dense in R. Also, any singleton {*} is nowhere dense in 
R. But this is not the general case; (jc}° = {*} can, and does, happen - how? 

(b) Finite unions of nowhere dense sets are again nowhere dense (see Exercise 4.56). 
But a countable union of nowhere dense sets may fail to be nowhere dense. For 
example, Q is not nowhere dense in R. 

(c) We have no choice but to be fussy here; note that while N is nowhere dense in 
R, it is not nowhere dense relative to N itself. In other words, we cannot ignore 
the fact that we have defined the phrase “£ is nowhere dense in M.” The closure 
and the interior named in the definition refer to the closure and interior in M , 
not in £. 

(d) In an unfortunate fluke of language, “not nowhere dense” is not the same as 
“dense.” Indeed, (0, 1) is not nowhere dense in R, and yet it certainly is not 
dense in R. It may be easier to understand the difference if we recall that some 
authors use the phrase everywhere dense in place of the single word dense. An 
everywhere dense set is one that is dense in every open set (see Exercises 4.45 
and 4.46). A nowhere dense set, on the other hand, is one that is not dense in 
any open set (see Exercises 19 and 20, below). And so nowhere dense means 
“not even a little bit dense”! 

Given this terminology, we next define two categories, or types, of subsets of a metric 
space M. A subset A of M is said to be of the first category in M (or, a first category 
set relative to M) if A can be written as a countable union of sets, each of which is 
nowhere dense in M. For example, it follows that Q is a first category set in R. Some 
authors refer to first category sets as “meager” or “sparse” sets. 

The second category consists of all those sets that fail to be in the first category. 
That is, a subset B of M is said to be of the second category in M if B is not of the 
first category. In other words, B is a second category set in M if, whenever we write 
B = |J“ , £«. some £„ fails to be nowhere dense in M\ that is, int(£„) ^ 0 for some 
n. (Look familiar?) 



The Baire Category Theorem 


133 


Examples 9.10 

(a) In the language of category. Corollary 9.7 says that R is a second category set 
in itself. And we could restate Corollary 9.8 by saying that R \ Q is a second 
category set in R. The two categories of subsets of R provide yet another measure 
of “big” versus “small” A first category set in R, such as Q, is “small” while a 
second category set in R, such as R \ Q, is “big.” 

(b) Again we will want to be careful. The two categories of subsets of M depend 
on the notion of nowhere dense sets, which in turn requires that we be precise 
about the host space M. For example, N is of the first category in R, but it is of 
the second category in itself. (Why?) In short, category is very relative. 

Finally we can state the general theorem. The proof is exactly the same as the one 
we gave for R; just repeat the proof of Theorem 9.3, using open balls instead of open 
intervals (and the nested set theorem in place of the nested interval theorem). 

The Baire Category Theorem 9.11. A complete metric space is of the second 
category in itself. That is, if M is a complete metric space, and if we write M = 
U n °°=, £n, then the closure of some E„ contains an open ball. Equivalently, if (G„) 
is a sequence of dense open sets in M, then fj~ , G„ f 0; in fact, G„ is 
dense in M. 

Note that we cannot expect a dense G s subset of a general metric space to be 
uncountable because M itself may be only countable. The fact that a dense Gs subset 
of R is uncountable hinges on the observation that if G is open and dense in R, then so 
is G \ {x} (see Exercise 15). 

Baire’s theorem is often applied in existence proofs; after all, the conclusion is that 
some set is nonempty. We will see several applications of this principle later in the 
book. For now, let’s just highlight the key fact: 

Corollary 9.12. In a complete metric space, the complement of any first category 
set is nonempty. In fact, it is even dense. (Why?) 


EXERCISES 

Except where noted, M is an arbitrary metric space with metric d. 

> 14. Prove that A has an empty interior in M if and only if A c is dense in M. 

> 15. If G is open and dense in R, show that the same is true of G \ { jc } for any x € R. 
Is this true in any metric space? Explain. 

16. Show that {jc} is nowhere dense in M if and only if x is not an isolated point of 

M. 

17. Prove that a complete metric space without any isolated points is uncountable. 
In particular, this gives another proof that A is uncountable. 

18. If A is either open or closed, show that bdry(A) is nowhere dense in M . Is the 
same true of any set A ? 
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19. Show that each of the following is equivalent to the statement that A is nowhere 
dense in Af : 

(a) A contains no nonempty open set. 

(b) Each nonempty open set in Af contains a nonempty open subset that is disjoint 
from A . 

(c) Each nonempty open set in Af contains an open ball that is disjoint from A . 

20. If A is nowhere dense in Af , and if G is a nonempty open set in Af , prove that 
A is nowhere dense in G. 

21. If x n — ► x in R, show that the set (jc } U {*„ : n > 1 } is nowhere dense in R. 
Is the same true if R is replaced by an arbitrary metric space Af ? Is every countable 
set nowhere dense? Explain. 

22. Let (r n ) be an enumeration of Q. For each n , let /„ be the open interval centered 

at r n of radius 2 ~ n , and let U = I„. Prove that U is a proper, open, dense subset 

of R and that U c is nowhere dense in R. 

23. Is there a dense, open set in R with uncountable complement? Explain. 

24. Prove Corollary 9.7. 

25. Prove Corollary 9.8. Deduce that the conclusion of Baire’s theorem holds for 
R\Q. 

> 26. Prove Theorem 9.11. 

27. Let Af be a complete metric space. If iVf = [J^, £„, where each E n is closed, 
show that D = Ldi int(En) * s dense in Af . [Hint: “Estimate” M\D.] 

>28. In a metric space Af, show that any subset of a first category set is still 

first category, and that a countable union of first category sets is again first 

category. 

> 29. In a metric space Af , prove that any superset of a second category set is itself a 
second category set. 

> 30. Show that N is first category in R but second category in itself. 

> 31. Show that Q is first category in itself (thus, completeness is essential in Baire’s 

theorem). 

> 32. In R, show that any open interval (and hence any nonempty, open set) is a second 
category set. 

33. If Af is complete, is every nonempty, open set a second category set? 

34. Let Af be complete, and let £ be an F c set in Af . Prove that £ is a first category 
set in Af if and only if E c is dense in Af . 

35. Let / : R R. Show that / is discontinuous on a set of the first category in 
R if and only if / is continuous at a dense set of points. 

36. If Af is complete, show that the complement of a first category set in Af is a 
dense set of the second category in Af . In particular, a first category set in a complete 
metric space must have empty interior. 

37. Show that the complement of a first category set in R is uncountable. 
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38. Is the complement of a first category set necessarily a second category set? 
Likewise, is the complement of a second category set necessarily a first category set? 
Explain. 

39. When is a first category set an F a set? Equivalently, when is a set containing a 
dense G& set itself a G& set? 

40. Let / : R -* R be a continuous function that is nonconstant on any inter- 
val. If A is a second category set in R, show that f(A) is also second category. 
[Hint: If B is closed and nowhere dense, show that f~ l (B) is closed and nowhere 
dense.] 

41. Let M be a complete metric space. Prove that if ( E n ) is a sequence of closed 
sets in M , each having empty interior, then \J%L { E n has empty interior. 

42. While completeness is essential in the proof of Baire’s theorem, the conclusion 
may still hold for some incomplete spaces. Show that it holds in N if we use the metric 
d(m , n) = \m — n\/mn , but that (N, d ) is not complete. [Hint: d is equivalent to 
the usual metric. See Exercise 7.14.] 

43. If N is homeomorphic to a complete metric space A/, show that the conclusion 
of Baire’s theorem holds in N. [Hint: Homeomorphisms preserve dense open sets. 
Why?] 

44. If M is complete, show that the conclusion of Baire’s theorem holds for any 
open subset of M . [Hint: See Exercise 7.30.] 

45. Fix n > 1, and let / : [ a, b ] -> R" be continuous and one-to-one. Show that 
the range of / is nowhere dense in R". [Hint: The range of / is closed (why?); if it 
has nonempty interior, then it contains a closed rectangle. Argue that this rectangle 
is the image of some subinterval of [ a, b ].] Use this to show that R and R n are not 
homeomorphic for n > 1 . 

46. Show that R 2 cannot be written as a countable union of lines. 

47. Let V be the vector space of all polynomials supplied with the norm ||p|| = 

maxtlfl, ! : i = 0, . . . , /*}, where p(x) = ao + a \x H \-a n x n e V. Show that V 

is not complete. 

48. IfW is a proper, closed, linear subspace of a normed vector space V, show that 
W is nowhere dense in V. [Hint: If W D B r (x) y then W D nB i (0) for every n. 
Why?] 

49. Let V be an infinite-dimensional normed vector space, and suppose that V = 
|J~, W n , where each W n is a finite-dimensional subspace of V. Prove that V is not 
complete. 

50. Let M be a separable metric space, and let 5 be a subset of M. A point x e S 
is said to be a point of first category relative to S if, for some neighborhood U of 
jc, the set U fl S is of first category in M . If So is the set of points of first category 
relative to S, show that So is of first category in M. [Hint: Af has a countable open 
base.] 


0 
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Notes and Remarks 

Baire’s result (for R n ) appears in his thesis, Baire [1899]. An early (and less explicit) 
version of the category theorem appeared in Osgood [1897], See Hawkins [1970] and 
Hobson [1927] for more details on Osgood’s contribution. 

Exercise 22 is adapted from Wilansky [1953b]. Diamond and Gell&s [1984, 1985] 
discuss certain relations that exist among the various notions of “big” and “small” 
sets that we have encountered (and even more that we haven’t!). The result stated in 
Exercise 50 is from Banach [1930], but see also Kuratowski [1966]. The bible for all 
matters categorical is Oxtoby [1971], 

As mentioned earlier in this chapter, Baire’s theorem has lots of applications. Here 
is one example (with a few details to check). The characteristic function of the rationals 
Xq is not the limit of a sequence of continuous functions. Suppose, to the contrary, 
that there is a sequence (/„) of continuous functions such that Xq(x) = lim f„(x) for 
each x e R. Then, the set A n = {x : f n (x) > 1/2} is open for each n and, hence, so is 
G„ = Ak = [x : fk(x) > 1/2 for some k > n). But then, f|£Li G„ = {x : f n (x) > 
1 /2 for infinitely many n } = Q (why?), and this contradicts Corollary 9.4. This example 
illustrates a special case of a deep result, due to both Baire and Osgood, stating that any 
function / : R -*• R that is the limit of a sequence of continuous functions must have a 
point of continuity. Various incarnations of the theorem are discussed in greater detail 
in Goffman [1953a], Hobson [1927], and Munroe [1965]. Myerson [1991] discusses 
the related problem of finding a sequence of continuous functions whose pointwise 
limit is finite on Q and infinite on R \ Q. We will discuss several applications of Baire’s 
theorem in Part Two, where we will give a proof of the Baire-Osgood theorem and 
further details on the set of discontinuities D(f) of a bounded function (especially 
concerning Exercises 9 and 13). 
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Historical Background 

Unarguably, modem analysis was formed during the resolution of an important contro- 
versy (or, rather, controversies) concerning the representation of “arbitrary” functions. 
This controversy has unfolded slowly over the last two centuries and was put to its final 
rest only in our own time. 

The story begins in 1 746 with the famous vibrating string problem. Briefly, an elastic 
string of length L has each end fastened to one of the endpoints of the interval ( 0, L ] on 
the jr-axis and is set into motion (as you might pluck a guitar string, for example). The 
problem is to determine the position y = F(x, t) of the string at time /, given only its 
initial position y = f(x) = F(x, 0) at time t = 0 where, for simplicity, we assume that 
the initial velocity F,(.r,0) = 0. The function Fix, t) is the solution to d’Alembert’s 
wave equation : F„ = arF xx , where a is a positive constant determined by certain 
physical properties of the string. The initial data for the problem is F(x,0) = fix), 
F,(x, 0) = 0, and /( 0) = 0 = /(L). 

The controversy, initially between d’Alembert and Euler, centers around the nature 
of the functions / that may be permitted as initial positions. D’Alembert argued that 
the initial position / must be “continuous” (in the sense that / must be given by a 
single analytical expression or “formula”), while Euler insisted that / could be “dis- 
continuous” (the initial position might be a series of straight line segments, as when 
the string is plucked in two or more places at once, in other words, a composite of two 
or more “formulas”). 

Now it is not hard to find particular solutions to the wave equation. Indeed, note that 
each of the functions F(x, t ) = sin(laTx/L)cos(aknt/L), k = 1, 2, 3, . . . , is a solution 
with corresponding initial position F(x, 0) = s\n(knx/L). If we assume the validity 
of term-by-term differentiation (that is, the “superposition” of solutions), this would 
suggest that any sum of the form 


OO 

F(x,t) = s\niknx/L)cosiaknt/L) (10.1) 

*=i 

is also a solution. In 1 753, Daniel Bernoulli entered into the controversy by claiming 
that equation (10.1) is the most general solution to the vibrating string problem. Euler 
immediately took exception to Bernoulli’s solution for, if we accept equation (10.1) as 
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the general solution, it follows that the initial position / must satisfy 

00 

/(•*) = a k sm(knx/L ). (10.2) 

*=i 

In other words, Bernoulli’s solution suggests that the initial position / can always be 
represented by a sine series of the form (10.2). As Euler pointed out, the sum in equa- 
tion (10.2) is odd and periodic, whereas no such assumptions can be made on /. (Since 
a “function” was understood to be a “formula,” it was believed that the behavior of a 
function on an interval completely determined its behavior on the whole line.) Besides, 
it was inconceivable that a “discontinuous” initial position could be written as the sum 
of “continuous” functions. Bernoulli’s arguments, which were based largely on physical 
principles, were unconvincing. His solution was rejected by most mathematicians of 
the time, including Euler and d’Alembert. 

Controversy over the solution to the vibrating string problem would rage on for an- 
other 20 years and would come to involve several mathematicians, including Lagrange 
and Laplace. 

The plot thickened in 1807, when Joseph Fourier resurrected Bernoulli’s assertion. 
Fourier presented a paper on heat transfer in which he was able to solve for the steady- 
state temperature T( x, y) of a rectangular metal plate with one edge placed on the 
interval [-L, L] on the jc-axis, and where the initial temperature along this edge f(x) = 
T(x , 0) is known but is again “arbitrary.” Fourier’s solution is based on the premise that 
an arbitrary function / can be represented as a series of the form 

aQ OO^ 

f(x) = y + ^2 (a n cos(nnx/L) + b„sin(njtx/L)). 

Moreover, if the interval in question is instead [ 0, L ], then it suffices to use only sines 
(as in Bernoulli’s series) or only cosines in the representation. 

If, for simplicity, we take L = n, then the Fourier series for / over the interval 
[-jr, n ] is given by 


f(x) = — + ^2 { a n cos nx + b„ sinnjc). 


(10.3) 


n=l 


Fourier justified this equation in much the same way that Euler and Lagrange had 
done before him; he argued that if the Fourier coefficients ao, , . . . , b, , b 2 , . . . could 
actually be determined, that is, if equation (10.3) could be solved, then it must be valid. 
To determine b m , for example, we simply multiply both sides of equation (10.3) by 
sinmjr and integrate over the interval [-tt, ji ] to obtain 



sin mxdx 



^ sin mx 
2 


OO 

4* ^ (a n cos nx sin mx + b„ sin nx sin mx) 

n= 1 


d x 
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sin mxdx 


00 pn 

+ / cos nx sin mx dx 

n=l 


oo pn 

4- ^ b„ I sin nx sin m x dx 

n=l 


= b m f sin 2 mx dx = b m n. 


since all of the remaining integrals are zero. A similar calculation shows that a m = 
(1/jt) f” n f(x)cosmxdx. Thus, if we assume the existence of the various integrals 
in this calculation, and if we assume that term-by-term integration of the series is 
permitted, then equation (10.3) can be solved. 

Fourier’s real innovation was not in his verification of equation (10.3) - in fact, his 
calculations were considered to be clumsy and nonrigorous - but rather in its inter- 
pretation. Fourier argued that the Fourier coefficients of an arbitrary (but presumably 
bounded) function could always be determined by interpreting nb„, for example, as 
the area bounded by the graph of y = fix) sin mx and the x-axis between x = -n and 
x = n. In other words, he transformed the question of existence of the series represen- 
tation into the geometrically obvious “fact” that the area under a curve can always be 
computed. 

But, as we will see later, it is not at all clear how to define the integral of an “arbitrary” 
function. Moreover, term-by-term integration (that is, the interchange of limits) is not 
so easy to justify - the question of convergence of the series enters the picture. For 
these reasons, Fourier’s work was not well received and his ideas on trigonometric 
series went unpublished until the appearance of his classic book, Theorie Analytique 
de la Chaleur, in 1822. 

In particular, Fourier’s methods allow for a discontinuous function to be written as a 
sum of continuous functions (in the modem sense of the words; see Exercise 3), which 
was an unthinkable consequence at the time. It was so unthinkable that Cauchy was 
prompted to set the record straight in his famous Cours d’ Analyse of 1821. Cauchy’s 
refutation of Fourier’s results, often called Cauchy’s wrong theorem, states that a conver- 
gent sum of continuous functions must again be a continuous function. (The problem, 
as we will see, comes in the interpretation of the word “convergent.”) Nevertheless, 
Fourier’s methods seemed to work. In fact, the general consensus at the time was that 
both Cauchy and Fourier were right, although a few details would obviously have to 
be straightened out; this was an uncomfortable point of view in the newly bom age of 
rigor. 

As early as 1 826, Abel noted that there were exceptions to Cauchy’s theorem and 
attempted to find the “safe domain” of Cauchy’s results. But the latent contradiction 
in Cauchy’s theorem was not fully revealed until 1847, when Seidel discovered the 
hidden assumption in Cauchy’s proof and, in so doing, introduced the concept of 
uniform convergence. 

Although Fourier was never able to fully justify his less than rigorous arguments, 
the questions raised by his work would inspire mathematicians for years to come. To 
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quote a recent article by Gonzdlez- Velasco: 

It was the success of Fourier’s work in applications that made necessary a redefi- 
nition of the concept of function, the introduction of a definition of convergence, 
a reexamination of the concept of integral, and the ideas of uniform continuity 
and uniform convergence. It also provided motivation for the theory of sets, was 
in the background of ideas leading to measure theory, and contained the germs 
of the theory of distributions. 


EXERCISES 

1. Let /( jc) and g(jc) be any two distinct choices from the list 1, cos x, 
sinjc, cos2jc, sin2jc, . . . , cos nx, sin/uc. Show that f(x)g(x)dx = 0 while 
f"„f(x) 2 dx #0. 

2. Use the result in Exercise 1 to conclude that the functions 1 , cos jc, sin jc, cos 2jc , 

sin2jc coshjc, and sinn* are linearly independent. 

3. Here is one of Fourier’s examples: Consider the “square wave” shown in 

Figure 10.1. (By including the vertical segments in the graph, Fourier imagined this 
as the graph of a continuous function.) Show that the Fourier series for this function is 
given by (2n)~ 1 sin 2nx. [Hint: Do a purely “formal” calculation of the Fourier 

coefficients, choosing any function values you find convenient at the points 0, ±n y . . . 
(note that the series vanishes at each of these points). This same example points up 
another source of controversy in Fourier’s work: Does term-by-term differentiation 
of this series produce a series representing the derivative of the “square wave”?] 
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4. Let / : R — ► R be twice continuously differentiable and 27T -periodic. It follows 
that /' and /" are both 2n -periodic and bounded. (Why?) 

(a) Use integration by parts to show that the Fourier coefficients of / satisfy \a„\ < 
C/n and \b n \ < C/n, for some constant C and all n > 1 , and hence that a n 0 
and b n -> 0. 

(b) Repeat the calculation in (a) to show that \a n \ < C/n 2 and \b n \ < C/n 2 , for 
some constant C and all n > 1 . Use this to conclude that the Fourier series for / 
converges at each point of R. (It must, in fact, converge to /, but this is somewhat 
harder to show.) 
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Pointwise and Uniform Convergence 


We began our study of metric spaces in Chapter Three under the premise that such 
abstractions would contribute to our understanding of limits, derivatives, integrals, and 
sums - in other words, calculus. And while we have seen a few instances of this, we 
have yet to speak at any length about our very first example: The metric space C[ 0, 1 ]. 
As we saw in Chapter Five, this is a space that we need to master. 

In the next few chapters we will focus our attentions on C[0, 1 ] and some of its 
relatives. We will want to answer all of the same questions about C[ 0, 1 ] that we have 
asked of every other metric space: What are its open sets? its compact sets? Is C[ 0, 1 ] 
complete? Is it separable? And on and on. You name it, we want to know it. 

The very first question we need to tackle is this: What does it mean for a sequence 
of functions to converge? There are many reasonable answers to this question, and we 
will talk about several before we are done, but only one will “do the right thing” in 
C[ 0, 1 ]. For instance, given a sequence (/„ ) of real- valued functions defined on [ 0, 1 ], 
we might consider the sequence of real numbers ( f n {x ))^, for each fixed x in 1 0, 1 ] 
and ask whether this sequence always converges. Or we might simply consider (/„) as 
a sequence of points in the metric space C[ 0, 1 ] and ask whether (/„) converges in the 
usual metric of C[0, 1 ]. Both alternatives have their place in analysis, and both have 
their merits, but, for CIO, 1 ] at least, the second alternative is more appropriate. 

To get a handle on this, we will want to examine both types of convergence in 
a variety of settings. The first type of convergence, called pointwise convergence , is 
somewhat easier to work with and, historically, is the older and more natural notion of 
convergence. Let’s start there. 


Examples 10.1 

(a) Our first example takes us all the way back to Chapter One. Recall that for 
each fixed x e R, the sequence ((1 + (x/n))")^!, converges to e x as n -*■ oo. 
Said in other words, the sequence of polynomials f„(x) = ( 1 + (x/n)) n converge 
pointwise to f(x) = e x on R. Now this particular sequence of functions is rather 
well behaved: for example, recall from Exercise 1.18 that ( 1 + (x/n)) n increases 
to e x . And, by way of bringing some calculus into the discussion, notice that for 
any fixed x we have 


£[(' + ;)■]- O + r-’ 

(as n -*■ oo) and also 



e — 1 



e x dx. 


(b) For each n, let g„ : [0, 1 ] -*■ R be the function whose graph is shown in 
Figure 10.2 (g„ is 0 outside the interval [0, l/n ] ). Then, for each x e [ 0, 1 ], 
the sequence g„(jt) -*■ 0 as n -*■ oo. Indeed, g„(0) = 0 for any n, while if x > 0, 
then g„(jt) = 0 whenever n > \/x. We say that g„ -*■ 0 pointwise on [0, 1 ]. 
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But notice that /„' g„ = 1 -f* 0. What happened? Integration is supposed to be 
continuous! 



2n n 


(c) Consider the sequence of functions h„ : [0, 1 ] -*■ R given by h„(x) = x n+ '/ 
(n + 1). Again, h„ -+ 0 pointwise on [0, 1 ]; in fact, |/t„(x)| < l/(n + 1) -► 0 as 
n — ► oo for any x in [ 0, 1 ). But now what about h' n (x) = x"? Well, /j^( 1) = 1 for 
any n, and if 0 < x < 1, then lim„_»oo h’ n (x) = lim^oo x n = 0; that is, ( h' n ) tends 
pointwise to the function k defined by k(x) = 0 for 0 < x < 1 and Jt(l) = 1. In 
particular, 

lim h' n (l) = 1 * 0 = (£- lim h n (x))\ 

n-oo " \dx n->oo J \ X = 1 

Isn’t this annoying? To make matters worse, notice that the limit function k isn’t 
even continuous. What’s wrong? 

(d) The pointwise limit of a sequence of functions has come up several times in our 
discussions of t\ , li, and l under the alias “coordinatewise” convergence. For 
example, recall that in our proof that £2 is complete we found a candidate for 
the limit of a Cauchy sequence in €2 by first computing the pointwise limit of 
the sequence. That is, a sequence (/„) in £2 is really a sequence of functions on 
N, and so we may consider their pointwise limit f(k) = lim«_,oo /« (*) for k € N. 
A similar device was used in Example 7.8, where we noted that the sequence 
/„ = (1, . . . , 1, 0, . . .) € £00 (where the first n entries are 1 and the rest are 0) 
converges pointwise on N to / = (1, 1, . . . ) (all 1) but that this pointwise limit 
is not a limit in the metric of £<». A more familiar example is provided by the 
ubiquitous sequence (e„). We noted in Chapter Three that (e„) tends pointwise 
to 0 on N but not in the metric of any of the spaces £ t , £ 2 , or £». Indeed, as we 
pointed out at the time, convergence in any of these spaces is “stronger” than 
pointwise convergence in the sense that convergence in the norm of t\ , £ 2 , or £«, 
implies coordinatewise or pointwise convergence on N, but not conversely. (See 
the discussion immediately preceding Exercise 3.40 and Exercise 3.40 itself for 
a positive result in this vein.) 

(e) A similar line of reasoning applies to R n as well. In this case we might consider 

an element of R n as a function on the set { 1 n] (as we did in our discussion 
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of C(M), where M is a finite set, at the end of Chapter Five). In R", of course, 
coordinatewise convergence of sequences coincides with convergence in any 
norm. (Why?) 

Our first three examples concerned the interchange of limits, as in lim n _ 00 f f„ = 
f lim B _» 0O yjf. While the interchange of pointwise limits worked just fine in Exam- 
ple 10.1 (a), it failed miserably in the next two examples. The interchange of limits 
typically requires something more than just pointwise convergence. In any case, point- 
wise convergence is evidently not the “right” mode of convergence for C[ 0, 1 ] because 
we already know that integration acts continuously on C[ 0, 1 ] and so should commute 
with a limit in the metric of C[0, 1 ]. Before we say more, let’s examine the formal 
definition of pointwise convergence. 

Let X be any set, let (Y, p)bea metric space, and let / and (/„) be functions mapping 
X into Y. We say that the sequence (/„) converges pointwise to / on X if, for each 
x € X, the sequence (f„(x)) converges to f(x) in Y. That is, 

(/„) converges pointwise to /on X if, for each point a: € X and for each e > 0, there 

is an integer N > 1 (which depends on both x and e) such that p(f„(x), f(x)) < e 

whenever n> N. 

Please note that since we are interested only in the distance between function values, 
pointwise convergence has very little to do with the domain space X; all we need is 
a distance function on (and, hence, a notion of convergence in) the target space Y. In 
discussing pointwise convergence, you may find it helpful to think of a sequence of 
functions (/„) as simply a “table” of values, with n determining the “rows” and each 
x € X determining a “column.” The values /i(x), as x ranges over X, are put in the 
first row; the values / 2 (j c), for x e X, are put in the second row; and so on. To say 
that (f„) converges pointwise means that each “column” of values, taken one at a time, 
converges (as n -*■ oo). 

Also notice that since the convergence of a sequence (/„(*)) is tested at each fixed 
x, one jc at a time, the rate of convergence N = N(x,e)at one* may be vastly different 
than at another x. In our “tabular” framework this means that nearby rows in the table 
formed by a pointwise convergent sequence of functions might be very different when 
compared over all x. All we can say with certainty is that the entries in a single column 
eventually begin to look alike, provided that we read beyond some Wth row - and just 
how far down the column we have to read before this happens may vary with each 
column or x value. This point is well illustrated by several of our earlier examples; let’s 
take another look: 

Examples 10.2 

(a) While the sequence f„{x) = (1 + (x/n)) n converges pointwise on R to fix) = e x , 
note that since each /„ is a polynomial in *, each is necessarily unbounded for 
large x. In particular, for n fixed, |(1 4- {x/n)) n \ -*■ oo as x -*■ — oo, while 
e x -*■ 0 as x -*• -oo. Thus, for any fixed n, we have |/„(x) — /(x)| — ► oo 
as x -*■ — oo. A more delicate calculation (with n still fixed) will also show 
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that \f„(x) — /(x)| — ► oo as x -*■ oo. Just how large to take x before, say, 
|/„(jc) - f(x)\ > 1, will vary with each n. 

(b) Consider the sequence (g n ) of Example 10.1 (b). Although g n (x) -*■ Oasn -► oo 
for each fixed a:, there are plenty of x for which an individual g„{x) is far from 0. In 
particular, g„( 1 /2n) = 2n -*■ oo. (At x = ( 1 /In), we would need N > \/x = 2 n 
to have g N (x) = 0.) 

(c) Next consider the sequence k n (x) = r" on [0, 1 ]. Pictured in Figure 10.3 are 
the graphs of k„ for n = 1, 2, 4, 6, and 16. As noted earlier, k„(l) = 1 for every 
n, while k„(x) -*■ 0 for x < 1 . That is, (k n ) converges pointwise to the function 
k in Example 10.1 (c). But notice, too, that near x = 1 each k„(x) is necessarily 
far from 0. In fact, k„(l/\/2) = 1 /2 for every n while \fl -*■ 1 as n oo. 



Now that we have had a chance to play around with an inappropriate mode of 
convergence in C[ 0, 1 ], let’s see if we can do better. We already know a metric on 
C[ 0, 1 ], and so we know what it means for a sequence (/„) in C[ 0, 1 ] to converge to a 
function / in the metric of C[ 0, 1 ]; it means that ||/„ - /lloo -*■ 0 as n -*■ oo. That is, 
sup 0 < x5l \f„(x) - f(x)\ -*■ 0 as n -*■ oo. If we expand this into an “ e, N ” statement, 
we will be able to compare it with the definition of pointwise convergence: 

/„ -*■ f in the norm of C[ 0, 1 ] if, for every e > 0, there is some N (which may 
depend on e) such that sup^.;, \f„(x) - /(;t)| < e for all n > N. 

And now let’s remove that supremum: 

/„ — ► / in the norm of C( 0, 1 ] if, for every e > 0, there is some N (which may 
depend on e ) such that | f„(x) - /(x)| < e for all 0 < x < 1 and all n > N. 

In other words, the inequality l/ n (jr) - /(*)| < e is to hold uniformly in x (for 
large n). 

Again appealing to our “tabular” analogy, the table for a sequence (/„) that converges 
in the norm of Cf 0, 1 ] has the property that all of the rows, beyond some Nth row, are 
uniformly similar, independent of the columns. The key, of course, is the sup-norm; 
we have insisted that the maximum pointwise difference between /„ and / be made 
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small. To put this in more familiar terms, recall that (/„) converges to / in the metric of 
C[0, 1 ] if (/„) is eventually in B e (f ) = {g e C[0, 1 ] : ||/ - g||oo < e}, and that B E (f) 
is the set of functions in C[ 0, 1 ) whose graphs are at a maximum vertical distance of e 
from the graph of /. Another picture might help; see Figure 10.4. 



The shaded region in Figure 10.4 (a) is the set {(jc, y) : |y - /(jc)| < e). A function 
g e C[0, 1 ] is in B c (f) precisely when its graph lies within this region, as depicted in 
Figure 10.4 (b). 

Let’s recall our first few examples. For the sequence (g„) in Example 10.1 (b) we 
have ||gn II oo = II £n - 0||oo = 2n -/* 0. Thus, while (g„) does converge pointwise to 0 on 
1 0, 1 ], it does not converge to 0 in the metric of C[ 0, 1 ]. In fact, (g„) cannot converge to 
any function in the metric of C[ 0, 1 ] since it is not a bounded sequence in C[ 0, 1 ]. For 
the sequence ( h„ ) in Example 10. 1 (c) we have \\h „ ||oo = l/(n + 1) -*■ 0, and hence (h„) 
converges to 0 in the metric of C[ 0, 1 ]. Finally, the sequence (k n ) of Example 10.2 (c) 
does not converge to any function in C[ 0, 1 ] (the function k certainly is not a candidate 
since it is not continuous). Why? Because (k„) is not a Cauchy sequence in C[0, 1 ]: 
Indeed, ||*„ - * 2 nlloc > M 1/v^) - **( 1/^5)| = (1/2) - (1/4) = 1/4. 

Convergence in the metric of C[0, 1 ] is called uniform convergence. It has little 
to do with continuous functions and a lot to do with the sup-norm (which, for this 
reason, is sometimes called the uniform norm). The formal definition should explain 
everything. 

Let X be any set, let (Y, p) be a metric space, and let / and (/„) be functions mapping 
X into Y. We say that the sequence (/„) converges uniformly to / on X if, for each 
e > 0, there is some N > 1 (which may depend on e) such that pif„i x), fix)) < e for 
all x € X and all n > N. 

To highlight the fact that p(f„(x), f(x)) is uniformly small for all x € X, we might 
replace it by sup x€X p(f„(x), fix)); that is, note that (/„) converges uniformly to / if 
and only if, for each e > 0, there is some N such that sup JgX pif„ix), fix)) < e for all 
n > N. (Why?) Said in still other words, (/„) converges uniformly to / on X if and 
only if sup J€X pif„ix), fix)) -*• 0 as n -*■ oo. (Look familiar?) 

Notice that a uniformly convergent sequence is also pointwise convergent (to the 
same limit). In other words, uniform convergence is “stronger” than pointwise conver- 
gence. (Why?) 
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In this notation we would say that the sequence (g„) of Example 10.1 (b) converges 
pointwise to 0 on [ 0, 1 ], but not uniformly; the sequence (/i„) of Example 10.1 (c)con- 
verges uniformly to 0 on [ 0, 1 ]; and the sequence (k„) of Example 10.2 (c) converges 
pointwise to k on [ 0, 1 ], but not uniformly. Notice, too, that uniform convergence de- 
pends on the underlying domain. Indeed, although ( k„ ) is not uniformly convergent on all 
of [ 0, 1 ], it is uniformly convergent (to 0) on any interval of the form [ 0, b ], where 0 < 
b < 1, because sup 0 < JS< , |*„(x)| = sup 0 < x < h \x n \ = b n 0 as n -► oo. Similarly, (g„) 
converges uniformly to 0 on any interval of the form [a, 1 ], where 0 < a < 1. (Why?) 

Examples 10J 

(a) Uniform convergence is meaningful on unbounded intervals, too. For example, 

consider /„(x) = jc/(1 + nx 2 ) for x e R and n = 1,2 It is easy to see 

that (/„) converges pointwise to 0 on R. To test whether the convergence is 
actually uniform, we might try computing the maximum value of |/„| on R 
(using familiar tools from calculus). Now /„'(*) = (1 - «x 2 )/(l -I- nx 2 ) 2 , which 
is 0 at x = ±\/yfn, and it follows from the first derivative test that /„( ± 
1/v/n) = ±1/(2 y/n) are the maximum and minimum values of /„. That is, 
sup, €R |/„(x)| = 1 /(2 y/n) — > 0 as n — ► oo, and so (/„) converges uniformly to 0 
on R. 

(b) Uniform convergence is also meaningful for unbounded functions. A somewhat 
contrived example should be sufficient to see what is going on. If we set g„(x) = 

x 3 + (l/n)forx € Randn = 1,2 then, clearly, (g„) converges uniformly to 

g(x) = x 3 on R. (Why?) In other words, the functions g„ need not be bounded; 
the important thing is that the difference g„ - g must be bounded (and tend 
uniformly to 0 of course). 

(c) For bounded, real-valued functions on N, uniform convergence is the same as 
convergence in the metric of t That is, if /, /„ e too, then (/„) converges 
uniformly to / on N if and only if ||/„ - /Hoc -*■ 0 as n -*■ oo. 

(d) If we identify R" with the real-valued functions on the set {1 «(, then 

uniform convergence on { 1 n} coincides with convergence in any norm on 

R". (Why?) 

By way of shorthand, we will occasionally (and sparingly) use the following notation. 
We write /„ -> / on X, or /„ -> / (with no additional quantifiers)^ to mean that 
(/„) converges pointwise to / on X. We write /„=(/ on X, or f„ z=i /, to mean 
that (/„) converges uniformly to / on X. This notation is intended as a visual reminder 
that uniform convergence is “stronger” than pointwise convergence. But, just to be 
on the safe side, any additional quantifiers always take precedence; for example, the 
statements “/„ — ► / uniformly on X ” and “/„ — ► / in (the metric of) C[0, 1 ]” should 
be interpreted to mean that (/„) converges uniformly to /. Obviously, we will have to 
be careful to avoid any confusion caused by this variety of notations. A comparison of 
the “abbreviated” definitions of pointwise versus uniform convergence pinpoints their 
differences: f„^f means 

Vx € X, Ve > 0, 3N > 1 such that /o(/ n (x), /(x)) < e, 'in > N, 
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x 

while /* =$ / means 

Vs > 0, 3N > 1 such that p(/„(jc), f(x)) < £, Vx € X , Vn > AL 

In other words, just as in the case of uniform continuity, the quantifier “Vjc” has moved 
forward (and so e and N no longer depend on x). 


EXERCISES 

5. Suppose that f n : [a, b ] R is an increasing function for each n, and that 
/ (jc) = lim^oo /„( x) exists for each jc in [ a, b ]. Is / increasing? 

6. Let /„ : [a, fc] R satisfy |/„(jc)| < 1 for all jc and n. Show that there is 
a subsequence ( f„ k ) such that lim*-oo /„*(*) exists for each rational x in [a,b]. 
[Hint: This is a “diagonalization" argument.] 

> 7. Let (/„) and ( g„ ) be real- valued functions on a set X, and suppose that (f„) and 
( g „ ) converge uniformly on X. Show that (f n + g n ) converges uniformly on X. Give 
an example showing that (f„g„) need not converge uniformly on X (although it will 
converge pointwise, of course). 

8. Let f n : R — ► R, and suppose that f n =3 0 on every closed, bounded interval 
[ a, b ]. Does it follow that f n =$ 0 on R? Explain. 

> 9. For each of the following sequences, determine the pointwise limit on the 
given interval (if it exists) and the intervals on which the convergence is uniform 
(if any): 

(a) f n (x) = x n on (-1, 1 ]; 

(b) f n (x) = n 2 x( 1 -x 2 ) n on [0,1]; 

(c) f n (x ) = nx/( 1 + nx) on [ 0, oo); 

(d) f„(x) = nx/( 1 + n 2 x 2 ) on [ 0, oo); 

(e) fn(x) = xe~ nx on [0, oo); 

(f) f n (x) = nxe~ nx on [0, oo). 

In each of the above examples, will term-by-term integration or differentiation lead 
to a correct result? 

10. Let / : R — ► R be uniformly continuous, and define f n (x) = / (j t + (1 /n)). 
Show that f n / on R. 

11. Suppose that f n =$f on R, and that / : R— ► R is continuous. Show that 
fn (x 4- (1 /n)) -+ f(x) (pointwise) on R. 

12. Prove that a sequence of functions /„ : X — > R, where X is any set, is 
uniformly convergent if and only if it is uniformly Cauchy. That is, prove that there 
exists some / : X -* R such that f n =3 / on X if and only if, for each e > 0, there 
exists an N > 1 such that sup x€X \f n (x) — f m (x)\ < s whenever m, n>N. [Hint: 
Notice that if ( f„ ) is uniformly Cauchy, then it is also pointwise Cauchy. That is, if 
SU P*€X I fn(x) — f m (x) | — ► 0 as m, n — ► oo, then ( f n (x )) is Cauchy in R for each 
jc € X.) 
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13. Here is a “negative” test for uniform convergence: Suppose that ( X , d) and 
(Y, p) are metric spaces, that /„ : X — ► Y is continuous for each n, and that (/„) 
converges pointwise to / on X. If there exists a sequence (*„) in X such that x n — *■ x 
in X but /„(*„) -f*- f{x), show that (/„) does not converge uniformly to / on X. 


Interchanging Limits 

As we have seen, pointwise convergence is not always enough to guarantee the inter- 
change of limits. In this section we will see that uniform convergence, on the other 
hand, does often allow for an interchange of limits. 

As a first result along these lines, we will prove that the uniform limit of a sequence 
of continuous functions is again continuous. (Compare this with Cauchy’s “wrong” 
theorem.) 

Theorem 10.4. Let ( X , d ) and ( Y , p) be metric spaces, and let f and (/„) be 
functions mapping X into Y. If (/„) converges uniformly to f on X, and if each 
f n is continuous at x e X, then f is also continuous at x. 

proof. Let e > 0. Since (/„) converges uniformly to /, we can find an m such 
that p(/(y), fmiy)) < e/3 for all y € X (we only need one such m). Next, since 
f m is continuous at x, there is a S > 0 such that p{f m (x), f m (y)) < e/3 whenever 
d(x, y) < S. Thus, if d(x, y) < S, then 

P(f(x), f(y)) < p(f(x), f„(x)) + p(f m (x). fmiy)) + pifmiy ). fiy)) 

< e/3 + e/3 + e/3 = e. □ 

To see that Theorem 10.4 is indeed a statement about the interchange of limits, let’s 
rewrite its conclusion. If x m -*■ x in X, then 

fix) = lim f„ix) = lim lim f„ix„), 

n-*oo n~+oc m-+oc 

since (/„) converges pointwise to / and each f„ is continuous at x. To say that / is also 
continuous at x would mean that 

fix) = lim fix„) = lim lim f„ix m ). 

m-poo m-pocn-poo 

Thus, in the presence of uniform convergence, we must have 
lim lim f„ix m ) = lim lim f„ix m ). 

n — poo tn — poo m-*oon— +oo 

In particular. Theorem 10.4 tells us that the space C[ a, b ] is closed under the taking 
of uniform limits. That is, if (/„) is a sequence in C[a,b], and if (/„) converges 
uniformly to / on [a,b], then / e C[a,b]. This is very comforting since, as we 
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have seen, convergence in the metric of C[a,b] coincides with uniform convergence. 
Specifically, 

fn~> f in C[a,b) <f=* \\f„ - /I* 0 <=> /„=»/ on [a, ft]. 


EXERCISES 

>14. Let /„ : R -*■ R be continuous for each n, and suppose that f„=t f on each 
closed, bounded interval [ a, b ]. Show that / is continuous on R. 

15. Let (X, d) and (Y, p) be metric spaces, and let f,f„:X—>Y with f n =t f on X. 
If each /„ is continuous at x € X, and if*,, — ► x in X, prove that lim f„(x n ) — /( x). 

n-+ oo 

16. Let (X, d) and (Y, p) be metric spaces, and let /, /„ : X -*■ Y with /„ / 

on X. Show that D(f) C U«i 0(/»), where D(f) is the set of discontinuities 
off. 

17. Suppose that /, /„ : X -» R. 

(a) Show that the set on which (/„) converges pointwise to / is given by 

nr-, u:., nr- : i/-w - /<^)i < a/*)). 

(b) What is the set on which (/„(*)) is Cauchy? If X is a metric space, and if each 
/„ is continuous on X y what type of set is this? 

> 18. Here is a partial converse to Theorem 10.4, called Dini's theorem. Let X be a 
compact metric space, and suppose that the sequence (/„) in C(X) increases pointwise 
to a continuous function / € C(X); that is, /„(*) < f n +\(x) for each n and x , and 
fn(x) — ► / ( jc ) for each x. Prove that the convergence is actually uniform. The same 
is true if (/„) decreases pointwise to /. [Hint: First reduce to the case where ( f n ) 
decreases pointwise to 0. Now, given e > 0, consider the (open) sets U n = {jc € X : 
f n (x) < e}.] Give an example showing that / e C(X) is necessary. 


Our next two results supply an interchange of limits for integrals and derivatives. 

Theorem 10.5. Suppose that /„ : [ a, b ] — ► R is continuous for each n, and that 
(/„) converges uniformly to f on [a, b]. Then f* f„(x) dx -► /* f(x) dx. 


proof. Note that since / € C[a, b ], the integral of / is defined! Next, 

1 / f n (x)dx-f f(x)dx\< f \f„(x) - f(x)\dx 
I Ja J a I J a 

<(b — a)\\f„ — /||oo -*■ 0. □ 


Example 10.6 

Suppose that the trigonometric series (a 0 /2) 4- cos nx + b„ sin nx) is uni- 

formly convergent on the interval [ — 7 r, n ). Then, according to Theorem 10.4, 
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its sum g(.r) is a continuous function on [-7T, tt J. It now follows from Theo- 
rem 10.5 that this series must, in fact, be the Fourier series for #(*). Indeed, for 
any k = 1, 2, 3 we have 


£ 


g(jt) sin kxdx 


cos/!* + b n sinnjc) 


=£[M>- 

n p7T QO /» 7T 

= y / sinkxdx + Y^a n j 

OO f 7 T 

tl bn I si 

n=\ J-n 


sin kx dx 


cosnx sin Acjc dx 


+ 

n=l 

= 


sin nx sinibc dx 


since Theorem 10.5 grants term-by-term integration. (Why?) A similar calculation 
shows that na k = f* n g(x) cos kx dx. We will return to this issue in subsequent 
chapters. 


Now that we know how to exchange limits and integrals, the Fundamental Theorem 
of Calculus will tell us how to exchange limits and derivatives. While our next result 
may look “overspecified,” it’s really very useful. 


Theorem 10.7. Suppose that (/„) is a sequence of real-valued functions ; each 
having a continuous derivative on[a,b], and suppose that the sequence of deriva- 
tives (/„') converges uniformly to a function g on [a, b]. If (/„( Jto)) converges at 
any point jc 0 in [a y b ], then, in fact , (/„) converges uniformly to a differentiable 
function f on [a,b]. Moreover /' = g. That is, (/„') converges uniformly to /' 
on [a,b]. 


proof. Let’s first check that (/„) converges pointwise to some function / on 
[a, b]. Let C = lim,,-^ /„(jc 0 ). Then, for any x e [a, b] we have 


fn(x) — fn(* o) + [ fn~*C+f 
Jxn J X,, 


s. 


since /„' =t g. Thus, /„ -> /, where f(x) = C + f* o g. It follows that f(x) = f (a) + 
f* g for any x in [ a, b ]. The right-hand side of this expression is (continuously) 
differentiable and, hence, so is /. Moreover, f' = g. That is, /„' =t f on[a,b ]. 

Finally, to show that (/„) converges uniformly to /, we just repeat our first 
calculation: 


\fnW-f( X )\ 




< \ua)-m\+ f i/;-/'i 

Ja 

< l/»(a) -/(«)! + (*- a)||/ n ' -/'||oo 


0. 


The right-hand side tends to 0 independent of x\ hence, |/„(jc) - /(jc)| -> 0 
uniformly in x. □ 
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EXERCISES 

19. Suppose that (/„) is a sequence of functions in C[0, 1 ] and that f„ =* / on 
[ 0, 1 ]. True or false? f 0 '~° /n) f„ -► /„' /• 

20. C (l) [ a, b ] is the vector space of all functions / : [ a, b ] -*■ R having a con- 
tinuous first derivative on [ a, b ]. Show that C (l) [ a, b ] is complete under the norm 
ll/llc-') = max a <*<fr \f(x)\ + max a < x <* |/'(*)|. 

21. Use Dini’s theorem to conclude that the sequence ( 1 + (x/n)) n converges uni- 
formly to e* on every compact interval in R. How does this explain the findings in 
Example 10.1 (a)? 

22. Recall that we have defined a metric on C(R) by setting d(f, g ) = 
£*=i 2 -"</„(/, g)/( 1 + d n (f, g)), where d„(f, g) = maX|,|<„ |/(r) - g(/)| (see 
Exercise 5.64). Prove that (/„) converges to / in the metric of C(R) if and only if (/„) 
converges uniformly to / on every compact subset of R. For this reason, convergence 
in C(R) is sometimes called uniform convergence on compacta. 


The Space of Bounded Functions 

Given a set X, we write B(X) for the vector space of all bounded, real-valued functions 
/ : X -*■ R, and we supply B(X) with the sup-norm ||/||oo = sup t€X - |/(jt)|. That is, 
B(X) is just £oc(X) with a new name. (The notation B(X) is somewhat more common- 
place than foo(A').) Thus, convergence in B(X) is the same as uniform convergence. 
Specifically, 

f n -+f inB(X) <=> «/„ - /||oo -»• 0 <=► f„=tf onX. 

Moreover, B(X) is complete under the sup-norm. The proof is exactly the same as 
that for too{X), of course, which means that it is essentially the same as that for £<». 
(Compare the proof of the following lemma with the “three-step” procedure outlined 
in Chapter Seven.) 

Lemma 10.8. If (/„) is a Cauchy sequence in B(X), then (/„) converges uni- 
formly to some f € B(X). Moreover, sup„ ||/Joo < oo and ||/„||oo -*■ ll/lloo as 
n -*■ oo. 

proof. The last two assertions follow from general principles: If (/„) is a 
Cauchy sequence in B(X), then (/„) is also a bounded sequence in B(X); that 
is, sup n || /n|| oo < oo. And if (/„) converges to / in the norm of B(X), then 
il/nlloo -► ll/lloo as n -> oo. (Why?) 

Now, if (/„) is Cauchy in B(X), then (/„) is also pointwise Cauchy; that is, 
for each x € X we have | f m (x) - f„(x)\ < ||/ m - /nlloo -> 0 as m, n -► oo, 
and so (/«(*)) is a Cauchy sequence in R for each x e X. Consequently, f(x) = 
lim„_oo fn(x) exists for each x e X. But, as we have already noted, (/„) is a 
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bounded sequence in B(X)\ thus, |/(*)| = lim B _oo l/*(*)l < sup„ ||/ n ||oo = C, 
and hence ||/||oo < C, too. That is, / e B(X). 

Finally, to see that (/„) converges uniformly to /, let e > 0 and x 6 X. Then 

1/00 - /»00 1 = lim | f m (x) - /„(x)| < e, 

m-*o o 

for all n sufficiently large, since \f m (x) - f„(x)\ < \\f m - /Joo < e for all m, n 
sufficiently large. And since this estimate is independent of x, we get || / - /„ Hoc < 
e for all n sufficiently large. □ 

A Cauchy sequence in B(X) is often said to be uniformly Cauchy, while a bounded 
sequence in B(X) is often said to be uniformly bounded to emphasize the presence of 
the uniform, or sup-norm. 

The fact that B(X) is complete is even more meaningful in the case where X is a 
metric space, for then we may also consider the space C(X) of continuous, real- valued 
functions on X. Now continuous functions on X are not necessarily bounded; in other 
words, C(X) is not, in general, a subspace of B(X). Thus we are led to consider the 
vector space C*(X) = C(X) n B(X), of all bounded, continuous, real-valued functions 
on X. It follows from Theorem 10.4 that C b (X) is a dosed subspace of B(X); hence 
C b (X) is complete under the sup-norm. (Why?) 

If X is a compact metric space, then C b (X) = C(X) and, what’s more, we may 
use the simpler expression ||/||oo = max^x 1/001 in place of the sup-norm on C(X). 
(Why?) In particular, C[ a, b ] is a complete normed vector space under the sup-norm 
(i.e., under uniform convergence). 

Now that we know that B(X) is a complete normed vector space, we may take advan- 
tage of yet another observation from Chapter Seven, namely, Banach’s characterization 
of completeness for normed spaces. The following special case of Theorem 7.12 is 
often called the Weierstrass M -test. 

Lemma 10.9. Let (g„) be a sequence in B(X) satisfying , II £» lloo < oo. Then 
g„ converges in B(X); that is, g„ converges uniformly on X. Moreover, 

II Xwi=l Sit lloo — 52n=l Hindoo- 

The usual notation in most advanced calculus books is to set M„ = ||g n ||<x> 
= sup, €X |g n (x)| (for the Max of the nth term), and consequently to require that 
<oo. Hence the name “A/-test.” 

Application 10.10. (Power Series) If the power series a n x " converges 
for some xo ^ 0, then it converges uniformly ( and absolutely) on every interval 
\x\ < R, where 0 < R < |xo|. Hence, the sum represents a continuous function for 
|x| < |jcoI- Moreover, term-by-term differentiation (in |x| < |jc 0 |) or integration 
(over [a, b ] C (— |xol. l-*ol)) leads to a correct result. 

proof. First notice that if ]T^lo converges, then the terms in the series are 
bounded, say, |a„ | |xol” < C for all n. Next fix 0 < R < |xo|,andletr = /?/|*ol < 1- 
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Then, for |jc| < R, we get 


\a„x n \ = 



< Cr". 


Thus, since Y1T=o ^ r " < °°> we have J2T=o a '> x ' > converging uniformly (and 
absolutely) on |x| < R by the Af-test. The sum YiT=o a * xn * s t h en continuous on 
(— |jcol, |x 0 |) because it is continuous on each interval [-/?, R ] for R < |*o|. 

Term-by-term integration over [a, b] c (-|*ol, kol) follows from Theorem 
10.5 (applied to /„( x) = T, n k =o a kX k ). 

Finally, to show that the sum is differentiable, we appeal to Theorem 1 0.7 (again 
applied to f„(x) = J2k=o a k x k ). The proof relies on the same technique used above, 
but now we make use of the fact that , nr "~ l converges for 0 < r < 1. It 
follows that the series na„x n ~ l converges uniformly (and absolutely) for 
|jc| < R, where R < |jco|, and so it must converge to ( d/dx ) (X^Lo a„x n ). □ 


AppIicationlO.il. (A Space-Filling Curve) We next construct a pair of con- 
tinuous functions x(t) and y(t) on [ 0, 1 ] such that the curve t h> (x(t), y(t)) fills 
the unit square [ 0, 1 ] x [ 0, 1 ]. In fact, our construction will show that the curve 
maps A onto [0, 1 ] x [0, 1 ]. 


proof. To begin, we define a map / : R -> [ 0, 1 ] as follows: Let f(t) = 0 for 
0 < t < 1/3, let f(t) = 3/ - 1 for 1/3 < t < 2/3, and let /(/) = 1 for2/3 < t < 1. 
Note that if t € A, then f(t) is the first digit in the ternary decimal expansion of 
t. We next extend / to all of R by taking / to be even and periodic, of period 2, 
as shown in Figure 10.5. 






The basis of our construction lies in the observation that the function g(t ) = 
£*lo2 - * -1 / (3*f) agrees with the Cantor function for t e A. That is, g(t) is 
another extension of the Cantor function to [ 0, 1 ] (indeed, to all of R). To see 
that this is so, let t = 0.(2ao)(2a i)(2<i2) • • • (base 3), where each a* is 0 or 1, be a 
point in A. Then, since / is periodic with period 2, we have 

/(3*0 = f(0.(2a k )(.2a k+l )(2a k+2 ) • • • (base 3) ) (Why?) 

= 0 if a k = 0, since O.Obzbj ■ ■ ■ (base 3) e [ 0, 1 /3 ] 

= 1 if a k = 1, since 0.2f>2^3 • • • (base 3) e [ 2/3, 1 ]. 

That is, /(3*r) = a k for / e A and hence 

OO 

g(t) = ^2 2~ k ~ l a k = 0.aoaia 2 • • • (base 2). 

*=o 
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Now we are ready to define our curve; set x(t) = 5Z^ 0 2 - * -1 / (3^/) and 
y(t) = 53*!o2 - * -1 / (3 2 * +l /). By the A/-test, x and y are continuous on all of R 
and, clearly, each maps R into [ 0, 1 ]. (Why?) 

To see that (jc(r), y(/)) fills the square, let jc 0 , yo € [ 0, 1 ] and write their base 2 
decimal expansions just so: 


*0 = O.flo«2 a 4 • • • (base 2) and yo = O.a 10305 • • • (base 2). 

Now set to = O.(2ao)(2a\ )(2a2)(2ai) • ■ • (base 3) € A. Then x(to) = xo and y(/o) = 
yo since / (3* to) = a* for each k. Thus the curve maps A onto [ 0, 1 ] x [ 0, 1 ]. □ 

The M-test can be used to give yet another description of the Cantor function, this one 
more in the spirit of our “middle thirds” construction (see Chapter Two). Specifically, we 
will simultaneously build the nth level Cantor set (we called this set /„ in Chapter Two) 
and an nth level polygonal approximation /„ : [ 0, 1 ] -*• [ 0, 1 } to the Cantor function 
/. (A polygonal function is a continuous function whose graph consists of finitely 
many straight line segments. Thus, a polygonal function is completely determined by 
its values at the finitely many “nodes” jci , . . . , x* corresponding to the finitely many 
“vertices” of its graph.) 



10.6 3 3 

To define the first approximation f\, set /i(0) = 0, /i(l/3) = 1/2 = /i(2/3), and 
/i(l) = 1, and then extend f\ to all of [0, 1 ] by “connecting the dots.” That is, f\ 
is a polygonal function on [0, 1 ] with “nodes” at the endpoints 0, 1/3, 2/3, and 1 of 
/, = [0, 1/3 ] U [ 2/3, 1 ], as shown in Figure 10.6. Note that f\ is constant on the first 
“discarded” interval J\ = (1/3, 2/3). 

The second polygonal approximation /2 is obtained by adding a few more nodes 
to the definition of f\ \ namely, let /> agree with f\ at each of the points 0, 1/3, 2/3, 
and 1, and now include /2U/9) = 1/4 = /2(2/9), and /2(7/9) = 3/4 = /2(8/9), 
as shown in Figure 10.6. Again, fi has nodes at the endpoints of h = [0, 1/9] U 
[ 2/9, 1 /3 ] U [ 2/3, 7/9 ] U [ 8/9, 1 ], and fi is constant on each subinterval of J 2 = 
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(1/9, 2/9) U (1/3, 2/3) U (7/9, 8/9). Note that the graph of / 2 contains two “scaled- 
down” copies of the graph of f\ . 

Can you see how we will define /j? We will add eight more nodes to the definition 
of / 2 , corresponding to the eight new endpoints introduced in / 3 , and we will take / 3 to 
be constant on each of the subintervals of / 3 (using 1/8, 3/8, 5/8, and 7/8 as the four 
new values), so that / 3 agrees with / 2 on J 2 and agrees with f\ on J\ . If you draw the 
graph of / 3 , you will see four “miniature” copies of the graph of f\ (or two copies of 
the graph of />). 

If we continue this process, we will get a sequence of increasing, continuous, polygo- 
nal functions (/„) on [ 0, 1 ] such that /„ is constant on each subinterval of J„ and linear 
on each subinterval of /„. In particular, each /„ is designed to agree with the Cantor 
function / on J„. Using induction (based on the graphs on f\ and / 2 and “scaling”), it is 
not hard to see that || /„+i - / n lloo < 2 “ n_l foranyn. Thus, the series ,(/ n+ i-/ n ) 
converges uniformly to an increasing continuous function g on [ 0, 1 ] (in other words, 
/„ =t g). But then g must agree with the Cantor function / on J„ = [0, 1 ] \ A, a 
dense subset of [ 0 , 1 J. Consequently, g = /. 

Next, let’s resolve an issue left over from Chapter Nine, namely, the converse to 
Theorem 9.2: Every F a subset of R can be realized as the set of discontinuities of some 
(bounded) function / : R -*■ R. 

Application 10.12. (Discontinuous Functions) Let F be a nonempty F a subset 
of R. Then, F = D(f) for some bounded function f : R -*■ R. 

proof. Write F = (J^, F n , where each F„ is a closed set in R. Since finite 
unions of closed sets are again closed, we may assume that F n c F n +\ for each 
n. Now, for each n, let G„ = Q n F°, the rationals in the interior of F„, and let 
fn = - x g. = % f„\g h • Then, f„ is clearly continuous at each point in the 

complement of F„, and /„ is discontinuous on F„ since the oscillation of /„ is 1 
at each point of F„. (Why?) Thus, D(f„) = F n . 

Next, let / = , 4 ~"f„. It follows from the M -test (and Theorem 10.4) that 

/ is a bounded function on R that is continuous on the complement of F. To 
see that / is discontinuous at each point of F, let x 6 F and choose n such that 
x € F„ \ F„_|. Then x e F* for alU: > n and, hence, the oscillation of / at x is 
at least 4" n - £ i>n 4"* = 4"" (2/3) > 0. □ 

As a final application of the A/ -test, we construct a continuous nondifferentiable 
function. The first published example of such a function was given by Weierstrass, who 
showed that the function f(x) = a n cos(h n x), where 0 < a < 1 and b is an odd 
integer satisfying ab > 1 4 - 3 tt/ 2 , fails to have a finite derivative at any point. The 
following is a simplified version of Weierstrass’s example. 

Application 10.13. (Nowhere Differentiable Functions) Given x € R, let 
g(x) denote the distance from x to the nearest integer, and define f(x) — 
-~ n g('T'x). Then, f is a bounded ( uniformly ) continuous function on R 
that fails to have a finite derivative at any point of R. 
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proof. The graph of g(x) is pictured in Figure 10.7. Note that g has period 1 
while g(2"jc) has period 2~". In particular, if x is a dyadic rational, x = i2~ n , for 
some integers i and n > 1, then 2 k x is an integer for all k > n, and so g(2*x) = 0 
for all it; > n. 



By the M- test, / is a bounded continuous function on R. (Since / is periodic 
with period 1, note that / is actually uniformly continuous.) Now, if / has a finite 
(two-sided) derivative at some (fixed) x e R, then 

/(Vn) ~ /(»n) 

V„ ~ U„ 

for any ( u „ ) and (v„) with u„ < x < v „ , u„ < v„, and v„ — u„ -*■ 0. (Why?) To 
show that / is nondifferentiable, then, we will show that this limit fails to exist 
for a suitable choice of (u„) and (i>„). 

Given n > 1, let u„ and v„ be the pair of successive dyadic rationals satisfying 
u„ < x < v„ and v„ - u„ = 2~ n . Then 


fM ~ /(«,) ^ gC 2W) - g(2*u n ) 

V„-U n 2 k v n - 2 k u„ 

But 2 k u„ = 2 k ~ n 2 n u„ = 2 *~ n i and 2 k v„ = 2 k ~"(i + 1), for some integer /. Since 
2 *-n < 1/2 for k < n, this means that 2 k u n and 2*v„ both lie in the same “half- 
period” for g and hence that g is linear on the interval [2 k u„, 2 k v„]. Thus each of 
the difference quotients in the sum on the right is ±1; that is, 


d„ 


f(Vn ) ~ f(Mn ) 

V„ ~ u„ 


n - 1 


= E±>- 

k=0 


Hence, the sequence of difference quotients ( d n ) cannot converge to a finite limit 
because successive terms always differ by at least 1 . □ 


EXERCISES 

> 23. Show that B(X) is an algebra of functions; that is, if /, g € B(X ), then so is 
fg and || /g || oo < H/lloo Hglloo- Moreover, if /„ -► / and g„ g in B(X), show 
that f„g„ -*■ fg in B(X). (Thus, multiplication is continuous in B(X). Compare this 
with Exercise 7.) 

24. B(X) is also a lattice: If /, g e B(X\ show that the functions / v g = 
max (/> g) and / A g = min{/, g} (defined pointwise, just as in Chapter Five) are 
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also in B(X) and satisfy ||/ v g\\oo < max{||/||oo, llglloo} and \\f A g||oo < 
maxJII/lloo, Hglloo). 

25. Show that B[ 0, 1 ] is not separable. [Hint: This is analogous to the proof that 
too is not separable. Consider the collection of characteristic functions of the intervals 
[0, x ] forO < x < 1.] 

26. If \a„\ < oo, prove that X^Li a„ sinnx and i a » cosnx are uni- 
formly convergent on R. 

27. Show that X^li jc 2 /( 1 +x 2 )" converges for all |x | < 1 , but that the conver- 
gence is not uniform. [Hint: Find the sum!] 

28. Let /„ : R — ► R be continuous, and suppose that (/„) converges uniformly on 
Q. Show that (/„) actually converges uniformly on all of R. [Hint: Show that (/„) is 
uniformly Cauchy.] 

29. 

(a) For which values of x does , ne~ nx converge? On which intervals is the 
conveigence uniform? 

(b) Conclude that f? X^ , ne~ nx dx = e/(e 2 — 1). 

30. Prove that x/[n"(l + nx 2 )] converges uniformly on every bounded in- 
terval in R provided that a > 1/2. Is the convergence uniform on all of R? 

31. Show that lim^i , nx 2 /(n 2 + x 2 ) = XXi w /(” 3 + 0- 

32. 

(a) If Er=. Kl < oo, show that , a„e nx is uniformly convergent on [ 0, oo). 

(b) If we assume only that ( a „ ) is bounded, show that X^ti a„e~ nx is uniformly 
convergent on [ S, oo) for every 8 > 0. 

> 33. Define I(x) = 0 for x < 0 and I(x) = 1 for x > 0. Given sequences (x„) 
and (c„) in R, with , |c„ | < oo, show that f(x) = Y1T= \C„l(x - x„) defines 
a bounded function on R that is continuous except, possibly, at the x„. 

34. Let 0 < g„ e C[a,b]. If g„ converges pointwise to a continuous 
function on [ a, b ], show that £^1, g„ converges uniformly on [ a, b ]. 

35. For which a € R is X^JLi xn a e~ nx a continuous function on (0, oo)? on 
[0, oo)? 

36. Show that both x n (l — x) and l)"x"(l — x) are convergent on 

[ 0, 1 ], but only one converges uniformly. Which one? Why? 

37. Where does x"/(l + x n ) converge? On which intervals does it converge 
uniformly? 

38. Let (/„) be a sequence of continuous functions on (0, oo) with |/„(x)| < n for 

every x > 0 and n > 1, and such that lim*-,,*, /„(x) = 0 for each n. Show that 
/(x ) = 2 - "/ n (x) defines a continuous function on (0, oo) that also satisfies 

lint*-.*, /(*) = 0. 

39. Show that C(R) is complete. [Hint: Use the fact that C[—n, n ] is complete for 
each n. See Exercise 22.] 
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40 . For any metric space X , show that X is isometric to a subset of Cb(X). [Hint: 
Mimic the proof of Lemma 7.17, showing that X embeds into t 0 0 (X) = B(X).] 
Conclude that X has a completion. 


0 


Notes and Remarks 

For more on the history of the vibrating string problem, see Carslaw [1930], Hob- 
son [1927, Volume II], Kline [1972], Langer [1947], Rogosinki [1950], Van Vleck 
[1914], and the excerpt “Riemann on Fourier series and the Riemann integral” in 
Birkhoff [1973] (wherein you will also find three excerpts from Fourier’s work); the 
excerpt is from Riemann [1902], in which Riemann develops his concept of the integral 
to address the problem of representing continuous functions by trigonometric series. 
For a detailed solution of the vibrating string problem see Folland [1992] or Tolstov 
[1962]. 

For a brief history of Fourier analysis, see the articles by Coppel [1969], Gibson 
[1893], Jackson [1920], Jeffery [1956], and Langer [1947], For more recent com- 
mentary see Grattan-Guinness [1970], Halmos’s “Progress Report” on Fourier series, 
Halmos [1978], the follow-up article by Bochner [1979], and Zygmund [1976]. For 
more information on Fourier himself see the biographies by Grattan-Guinness [1972] 
and Herivel [1975], the article by Gonzalez- Velasco [1992] (which is the source of the 
quote at the beginning of the chapter), and Komer [1988]. In addition to containing 
entertaining historical tidbits, Komer’s book is an excellent introduction to Fourier 
analysis. For an enlightening discussion of the impact of Cauchy’s famous “wrong” 
theorem and its connection with Fourier’s work, see Lakatos [1976]. 

For more details on Exercise 4 (and related issues), see Jackson [1926, 1934a, 1941], 
Rogosinski [1950], and Simon [1969]. The first general convergence result for Fourier 
series is generally attributed to Jordan [1881]. 

Pointwise convergence is “as old as the hills,” and it is at least as old as calculus 
itself. Uniform convergence was first introduced by Seidel [1847], and in the same 
year by George Stokes [ 1 848]; see Hardy [1918], Hawkins [ 1 970], and Lakatos [ 1 976]. 
Once the notion of uniform convergence was recognized as the proper tool for the 
preservation of continuity in the limit, Weierstrass and his students began a “witch 
hunt” for the uses of Cauchy’s theorem during the previous 50 years, in an attempt to 
set the record straight. The age of rigor would come to full maturity under Weierstrass’s 
guidance. For more about Weierstrass himself, see Polubarinova-Kochina [1966]. 

The example of a space-filling curve given in Application 1 0. 1 1 is due to Schoenberg, 
by way of Lebesgue, and first appeared in Schoenberg [1938]. Curiously, Schoenberg’s 
curve turns out to be nowhere differentiable, whereas Lebesgue ’s (the one that we dis- 
cussed in Chapter Six) is differentiable almost everywhere. For more on this see Schoen- 
berg [1982] or Sagan [1986, 1992]. The Schoenberg-Lebesgue example is typical of 
a wider class of space-filling curves; in particular, the curve (x (/), y(t)) is space-filling 
whenever* and y are stochastically independent. See Holbrook [1991]. 

The construction in Application 1 0. 1 2 is based on the presentation in Oxtoby [1971]. 
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Both Weierstrass and Riemann spoke of continuous, nowhere differentiable functions 
in their lectures as early as 1861 (and other examples of such functions are now known 
to have existed prior to 1861), but the first published example is due to Weierstrass, an 
example that finally appeared in du Bois-Reymond [ 1 875]. See also Weierstrass [ 1 895, 
Vol. 2, pp. 71-76]. For more about Riemann’s examples, see Hardy [1916], Hawkins 
[1970], Neuenschwander [1978], Segal [1978], and A. Smith [1972]. 

The example of a continuous, nowhere differentiable function constructed in Ap- 
plication 10.13 is generally credited to van der Waerden [1930]. The particulars of the 
present construction are taken from Billingsley [1982], but see also Boas [I960]. 

A great deal has been written about nondifferentiable functions in general and 
Weierstrass’s example in particular. A short but thorough historical account is given in 
Hobson [1927, Volume II], but see also Hardy [ 19 16]. A longer account, which includes 
some discussion of space-filling curves, is given in Singh [1969]. 

Exercise 40 pinpoints our interest in Cb(X ) and B(X): They are “universal” metric 
spaces. In order to “know” all metric spaces, it is enough to know just the spaces Cb(X). 
We will have more to say about this point of view in the next chapter. For now, simply 
notice that Q,(X) determines X in the sense that the bounded, continuous, real-valued 
functions on X determine the closed sets in X (see Chapter Five). For detailed proofs 
of the results in Exercises 40, see Kaplansky [1977]. 
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The Weierstrass Theorem 

While we now know something about convergence in C(X), there are many more things 
that we would like to know about C(X). We will find the task unmanageable, however, 
unless we place some restrictions on the metric space X. If we focus our attention on the 
case when X is compact, for example, we will be afforded plenty of extra machinery: 
In this case, C(X) is not only a vector space, an algebra, and a lattice (where algebraic 
operations are defined pointwise), but also a complete normed space under the sup- 
norm. With all of these tools to work with, we will be able to accomplish quite a bit. 
And at least a few of our results will apply equally well to the space C*(X) of bounded 
continuous functions on a general metric space X. For the remainder of this chapter, 
then, unless otherwise specified, X will denote a compact metric space. 

We will concentrate on two questions in particular, and each of these will lead to 
some interesting applications: 

• Is C(X) separable? More importantly, are there any “useful” dense subspaces, or 
even dense subalgebras, or sublattices of C(X)? 

• What are the compact subsets of C{X)1 And are such sets “useful”? 

Either question is tough to answer in full generality, but the first one has a very satis- 
factory and easy to understand answer for C[a,b]. Since C[a,b] is such an important 
space for our purposes, besides being the obvious place to start, we will spend much 
of our efforts on just this case. An initial simplification will help (see Exercise 5.63). 

Lemma 11.1. There is a linear isometry from C[0, 1 ] onto C[a,b ] that maps 
polynomials to polynomials. 

proof. Define a : [a, b ] -+ [0, 1 ] by o(x) = (x — a)/(b — a) for a < x < b. 
Then o is a homeomorphism, and the map T a (f) = f o o defines a linear 
isometry from C[ 0, 1 ] onto C[a,b]. Indeed, T„ is clearly linear. It is one-to- 
one and onto because it has an obvious inverse, namely, T„-\{h) = h o a where 
o~ l (t) = a + t(b — a) for 0 < t < 1. Finally, it is an isometry because o is onto: 
max as ,< fr |/(ct(x))| = max f€w(fl ,6) |/(r)| = maxo<,<i |/(0|. 

Moreover, T„ is both a lattice isomorphism and an algebra isomorphism. That 
is, T„(f) < T a (g) if and only if / < g, and T„(fg) = T a (f) T„{g). In particular, note 
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that T a maps polynomials to polynomials: If pit) = e;=o akt k is a polynomial in 
i, then p{o(x)) = [(* ~ a )/(b - a)]* is a polynomial in x. □ 

The proof of Lemma 11.1 tells us that C[ a, b ] and C[ 0, 1 ] are, for our purposes, 
identical. The point here is that we need only concern ourselves with a single choice 
of the interval [a, b\, and [0, 1 ] is often most convenient. Virtually any result that 
we might obtain about C[0, 1 ] will readily transfer to C[a,b). To begin, we will 
show that C[ a, b ] is separable by showing that C( 0, 1 ] is separable. We will give two 
proofs of this result, the first of which is a “proof by picture” while the second is more 
analytical. 

Theorem 11.2. C(0, 1 ] is separable. 

proof. Let / € C[ 0, 1 ], and let e > 0. We first approximate / by a polygonal 
function, as shown in Figure 11.1. Since / is uniformly continuous, we can find a 
sufficiently large n so that \f(x) - /(y)| < e whenever |jc - y| < l/n. This means 
that the polygonal function g defined by gik/n) = f(k/n), for k = 0, . . . , n, and 
g linear on each interval (k/n, ( k + 1 )/n) satisfies ||/ - g||oo < e. (Why?) 



tion that also has nodes at k/n for k = 0 n, but with h(k/n) rational and 

satisfying \h(k/n) - g(k/n)\ < e for each k. Then, ||g - A||oo < £ and, conse- 
quently, ||/ -Alloc < 2e. 

We’re done! The set of all polygonal functions taking only rational values at 
the nodes (A/n)^, for some n, is countable. (See Exercise 1) □ 


EXERCISES 

> 1. For each n, let Q„ be the set of all polygonal functions that have nodes at x = k/n, 
k = 0, . . . , n, and that take on only rational values at these points. Check that Q„ 
is a countable set, and hence that the union of the Q„'s is a countable dense set in 
C[0, 1]. 

2. Let a = x\ < xi < • • • < x„ = b be distinct points in [a,b ], and let S„ 
be the set of all polygonal functions having nodes at the x^. Show that S„ is an 
n-dimensional subspace of C[a, b] spanned by the “angles” <Pk(x) = |jc — **1 + 




164 


The Space of Continuous Functions 


(x — Xk), for k = 1 n — 1, and the constant function <p$(x) = 1. Specifically, 

show that each h e S„ can be uniquely written as h(x) = YlkZo c k<Pk(x). [Hint: The 
system of equations h(Xk) = cq + 2 c, (jc* — a, ), k — 1 , . . . , n. can be solved 

for the c,. Why? How does this help?] 

3. Prove that every polygonal function is Lipschitz. Thus, the Lipschitz functions 
are dense in C[a, b ]. 


Our second proof that C[a, h ] is separable uses a much more convenient dense set 
(at least for our purposes). 

The Weierstrass Approximation Theorem 1 1.3. Given f e C[a, b]and e > 0, 
there is a polynomial p such that 11/ - pll* < £• Hence, there is a sequence of 
polynomials (p n ) such that p n =t / on [ a, b ]. 

The Weierstrass theorem leads to a second proof that C[a,b \ is separable. Indeed, 
given a polynomial p and any e > 0, we can find another polynomial q with rational 
coefficients such that \\p - 9 II 00 < e on [a. b}. (How?) Since the set of polynomials 
with rational coefficients is a countable set, this implies that C[ a, b ] is separable. 

Of course, following Lemma 1 1 . 1 , we need only establish the Weierstrass theorem for 
C[ 0, 1 ]. (Recall that our identification ofC[a, b ] with C[ 0, 1 ] preserves polynomials.) 
The proof that we will give in this case is quite explicit; we will actually display a 
sequence of polynomials that converges uniformly to a given / e C[ 0, 1 ]. Specifically, 
given / € C[0, 1 ], we define the sequence (£„(/))“ , of Bernstein polynomials for 
/by 

( B n (f))(x ) = - JO" - *. 0 < x < 1. 

Please note that B„(f) is a polynomial of degree at most n. Also, it is easy to see that 
(»«(/))( 0) = /( 0) and (fl„(/))(l) = /( 1). In general, (£„(/))(*) is an average of the 

numbers f(k/n), k = 0 n (more on this later). 

We will prove Weierstrass’s theorem by proving: 

S. N. Bernstein’s Theorem 11.4. B„{f) / on \ 0. 1 ]for each f in C[ 0 , 1 ]. 

The proof of Bernstein’s theorem is easy once we catalogue a few facts about the 
polynomials B„{f). For later reference, let’s agree to write 

/o(*)=L f\(x) = x, and h{x) = x 2 . 

Among other things, the following lemma establishes Bernstein’s theorem for these 
three polynomials. Curiously, these few special cases will imply the general result. 

Lemma 11.5. 

(0 Bn(fo) = fo and B„{f\) = /,. 

(ii) £„(/>) = ^1 - ^jf 2 + Vi, and hence B n (f 2 ) =t />. 
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« g(H’ (;>*“ -»-*= 


"’sf. 


*/ 0 < jc < 1 . 


(iv) Given S > 0 and 0 < x < 1, let F denote the set ofk in {0, — n)for which 
|(J k/n) — x| >5. Then 

1 


T.C) 

keF 


x K (\ — x) n k < 


4 n8 2 


proof. The fact that B n (f 0 ) = fo follows from the binomial formula: 


jc (1 - x) n = [jc + (1 - x)] n = 1. 


±0 

*=0 

To see that B n (f\) = f\, first notice that font > 1 we have 
k /n\ _ (n — 1 ) ! _ /n - I\ 

n\k) ~ (*- !)!(«-*)! ~\k-l)' 

Consequently, 

-SC;') 

Next, to compute B„(f 2 ), we rewrite twice: 

/ k\ 2 (n\ k (n - l\ n-l k - 1 /« - 1\ 1 /n - 1\ 

W \k)~n\k-l)~ n n- 1 \k - l/ + n\k- 1/’ 


jc'(1 — = jc. 


if it > 1 
if it > 2. 


Thus, 




*=0 



x k (l -x) n ~ k 

which establishes (ii) since ||B n (/2)- /2II0C = (l/«)||/i - /2II00 -*■ Oasn -»• 00. 

To prove (iii) we combine the observations in (i) and (ii) and simplify. Since 
(( k/n ) - x) 2 = ( k/n ) 2 - 2x(k/n) + x 2 , we get 

I 1 

= -*(1 -x) < — , 
n 4n 


for 0 < x < 1 . 
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Finally, to prove (iv), let 0 < x < 1 and note that 1 < ((k/n)—x) 2 /S 2 fork e F. 
Hence, 

2 


E (*)'*" - ' r * 5 h - ’) O'* 0 ’ x) '~‘ 


*=0 

-i* from(iii) - D 


Now we are ready for the proof of Bernstein ’s theorem: 


proof. Let / € C[ 0, 1 ] and let e > 0. Then, since / is uniformly continuous, 
there is a S > 0 such that | / (jc) - /(y)| < e /2 whenever |jc - y| < S. Now we 
use Lemma 1 1.5 to estimate ||/ - fl„(/)||oo- First notice that since the numbers 
(J)jc*(1 - jc)"-* are nonnegative and sum to 1, we have 

= |e (/«-/(;)) C>‘<' -*>■-* 

Now fix n (to be specified in a moment). Given 0 < jc < 1 , let F denote the set of 

it in {0 n) for which |(Jfc/n) — jc| > S . Then |/(jc) — f(k/n)\ < e /2 for k $ F , 

while |/(jc) - f{k/n)\ < 2||/||oo for k e F. Thus, 

|/(Jc)-(B n (/))(Jc)| 

< | ^**0 - x)"~ k + 211/Hoo ^ 

<^ •1+211/1100 — 2 . from Lemma 11.5 (iv), 

< e, provided that n > ll/lloo/f^ 2 - 

Since this choice of n does not depend on jc, we get that ||B„(/) — /||oo < £ 
whenever n > ||/||oo/«^ 2 - □ 


There is a probabilistic interpretation of Bernstein’s result. To see this, fix an jc in 
[ 0, 1 ], and consider a “game” with probability of success equal to jc and, hence, prob- 
ability of failure equal to 1 - jc. For instance, a coin might be weighted so as to come 
up heads with probability jc and tails with probability 1 - jc. Then, the probability of 
exactly k successes in n independent trials of the game is given by (£)jc*( 1 - jc)" - *. This 
is one of the terms in the so-called binomial distribution. The first part of Lemma 1 1 .5 
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says that this is, indeed, a probability distribution since (£)**( I — x) n ~ k = 1, and 
that the mean of this distribution is £*=0 *(£)•**( 1 - x)"~ k = nx. The second part of 
Lemma 11.5 computes its variance as 0 (Jt — njc) 2 (J)jc*(l — x) n ~ k = nx(l — *). 
The last part of Lemma 11.5 is (Chebyshev’s version of) Bernoulli's law of large 
numbers; it says that, for large n, most of the weight of the distribution is concentrated 
near the mean. Those terms for which \k — nx\ is large do not contribute much to the 
distribution. In other words, if n is large, the most likely outcome of n trials is to have 
roughly nx successes. Thus, the average number of successes in a large number of 
trials is a good estimate for the actual probability of success. 

The binomial distribution in the case n = 12 and jc = 1/3 is depicted in Figure 1 1.2. 
Note that the most likely outcome is k — nx = 4 successes; the probabilities of k = 10, 
1 1, or 12 successes are so small that they do not even register on the graph. 



To phrase Bernstein’s theorem in this language, consider / € C[ 0, 1 ] as the “payoff” 
for the game; if there are k successes in n trials, we win (or lose) an amount equal to 
f(k/n). What are our expected winnings, given that the probability of success on any 
one trial is jc? It is exactly (B„{f))(x)\ For n large, then, we would expect our winnings 
to be approximately /(jc). The law of large numbers and the uniform continuity of / 
are responsible for the fact that this approximation is uniform (it depends on / and n, 
but not on jc). 

We will see in the next chapter that Theorem 1 1.3 will generalize to C(X), where X 
is compact. On the other hand, the “easy” proof given for Theorem 1 1.2 would be hard 
to mimic in a more general setting. The major difference between the two results is that 
the polynomials form a subalgebra of C[ a, b ] while the polygonal functions form only 
a subspace. The fact that the Weierstrass theorem admits an algebraic interpretation 
along these lines will prove very useful in the next chapter. 

The Weierstrass theorem affords us some small insight into the moment problem. 
The problem, loosely stated, is this: Consider a thin metal rod placed along the interval 
[a, b] on the jc-axis, and suppose that we know the density of the rod at each point 
jc as a function /(jc) in C[a.b]. The question is: Does the sequence of moments 
p„ = /* x n f{x)dx (about the y-axis) uniquely determine /? If we knew the sequence 
of numbers (p.„), could we actually reconstruct /? The answer, as it happens, is yes, 
but it is a bit beyond our means at this point. We can, however, say this much: The 
solution, if it exists, has to be unique. That is, if two functions / and g in C[ a, b ] have 
the same moment sequence, then / and g must be identical. Thanks to the linearity of 
the integral, it is enough to establish the following: 
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Application 1 1.6. Iff eC[a, b],andif f£ x n f(x)dx = 0foreachn = 0, 1,2 

then f = 0. 

proof. From the Weierstrass theorem, there is a sequence of polynomials (p n ) 
such that p„ / on [a, b ]. Hence, / ■ p„=t f 2 on [a, b], and so 

f f 2 (x)dx = lim f f(x) p„(x)dx. 

Ja n- *°° Ja 

But since /* x n f(x ) dx = 0 for each n (and since the integral is linear), it follows 
that /* f(x) p n (x)dx = 0 for each n. That is, f 2 (x)dx = 0. Since / is 
continuous, this means that / = 0. (Why?) □ 


EXERCISES 

4. Give a detailed proof of the assertion that the Weierstrass theorem for general 
[ a, b ] follows from the result on [ 0, 1 ] (by using Lemma 11.1). 

5. Show that \B n (f)\ < B„(|/|), and that B„(f) > 0 whenever f > 0. Conclude 
that ||B n (/)|| !JC < H/Hoo. 

6. If / 6 B[0. 1 J, show that B n (f)(x) — ► f(x) at each point of continuity of /. 

> 7. Up is a polynomial and e > 0, prove that there is a polynomial q with rational 
coefficients such that Wp — qWoc < £ on [ 0 f 1 ]. 

8. Prove that C(R) is separable. 

> 9. Let V„ denote the set of polynomials of degree at most n , considered as a subset 
of C[ a, b ]. Clearly, V„ is a subspace of C[ a, b ] of dimension n + 1 . Also, V n is 
closed in C[ a, b ]. (Why?) How do you know that V> the union of all of the V n , is 
not all of C[a , i]? That is, why are there necessarily nonpolynomial elements in 
C[a y b]l 

10. Let (jc,) be a sequence of numbers in (0, 1) such that lim^^O/rt) x- 

exists for every k = 0, 1, 2 Show that lim^son/H) /(jc,) exists for 

every / € C[0, 1 ]. 

11. Several proofs of the Weierstrass theorem are based on a special case that 
can be checked independently: There is a sequence of polynomials (P„) that 
converges uniformly to \x\ on [—1,1]. Here is an outline of an elementary 
proof: 

(a) Define ( P n ) recursively by Pn+iU) = P n (x) + [jc - P„(x) 2 ]/ 2, where P 0 (x) = 
0. Clearly, each P n is a polynomial. 

(b) Check that 0 < P n (; c) < P n+i (jc) < Jx for 0 < x < 1. Use Dini’s theorem 
(Exercise 10.18) to conclude that P n (x) <Jx on [ 0, 1 ]. 

(c) P n { jc 2 ) is also a polynomial, and P n (x 2 ) =t |jr| on [— 1, 1 ]. 

Since a polygonal function can be written in the form £* =I a, |jc — jc, | +bx 4* d, 
it follows that every polygonal function can be uniformly approximated by 
polynomials. The Weierstrass theorem now follows from the proof of Theorem 1 1.2. 
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> 12. Let p n be a polynomial of degree m„, and suppose that p n —^f on [a, b], 
where / is not a polynomial. Show that m„ —*■ oo. 

13. Show that the set of all polynomials V is a first category set in C[ a, b ]. 

14. Let / € C[ a , b ] be continuously differentiable, and let e > 0. Show that there 
is a polynomial p such that ||/ — /? || oo < £ and ||/' — p’Woo < £• Conclude that 
C (1) [ a, b ] is separable. 

15. Construct a sequence of polynomials that converge uniformly on [ 0, 1 ] but 
whose derivatives fail to converge uniformly. 

16. Prove that there is a sequence of polynomials ( p „ ) such that p„ -*■ 0 pointwise 
on [ 0, 1 ], but such that / 0 ' p„(x)dx -*■ 3. 

17. Suppose that / : [ 1 , oo) — ► R is continuous and that linv t _ 0O f(x) exists. For 
e > 0, show that there is a polynomial p such that |/(jc) — p(l/x)\ < e for all 
x > 1. 

18. Find B„(f) for f(x) = x i . [Hint: k 2 = (k — l)(t — 2) + 3(1: — 1) + 1.] Note 
that the same calculation can be used to show that if / € V m , then B„(f) € V m for 
any n > m. 

19. Here is an alternate approach to Exercise 14: If / is continuously differentiable 
on [ 0, 1 ], show that B n +\ (/)' /'on [0, 1 ]. [Hint: The mean value theorem and a 
bit of rewriting allow for the comparison of B„+\ (/)' and B„(f). If we set p„,*(x) = 
(" k )x k (\ - x)"~ k , show that p' n+l k = (n + l)(p„. k -\ - p n .k) ■] 

Lip^-or denotes the set of functions / 6 C[ 0, 1 ] that are Lipschitz of order a with 
constant K on [ 0, 1 ], where 0 < a < 1 and 0 < K < oo. That is, / e Lip^or 
if I fix) - /(y)| < K\x - y\ a for all x, y 6 [0. 1 ]. (See Exercises 8.57-8.60 for 
more details.) We write Lip cr for the set of / that are in Lip^a for some K ; that is. 
Lip a = Ux=i L»Pa:“- 

20. Show that Lip^a is closed in C[ 0, 1 ]. In fact, if a sequence (/„) in Lip^a 
converges pointwise to / on [ 0, 1 ], show that / G Lip^or. Is Lip^a a subspace of 
C[0, 1]? 

21. Show that Lip a is a subspace of C[ 0, 1 ]. Is Lip or a subalgebra of C[ 0, 1 ]? 

22. Show that every polynomial is in Lipl, but that y/x , for example, is 
not. 

23. Show that x a G Lip a. For which fi > 0 is x& e Lip a? 

24. Prove that Lipl is not closed in C[ 0, 1 ]. In fact, Lip 1 is both dense and of first 
category in C[0, 1 ]. [Hint: For e > 0, find / £ Lip*l with ||/||oo < £• That is, 
show that Lip^ 1 is nowhere dense.] 

25. Prove that the set V of all polynomials is both dense and of first category in 
C (,) [ 0,1]. 

26. For each / G Lipa, define N a (f) = sup^ v [ \f(x) - f(y)\ / \x - y\ a ]. 

(a) Show that N a defines a seminorm on Lip or. 

(b) Show that ||/||u P a = ll/lloc + N a (f) defines a complete norm on Lipa. 
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Trigonometric Polynomials 

In a follow-up to the paper in which Weierstrass established his famous theorem on ap- 
proximation by algebraic polynomials, he proved an analogous result on approximation 
by trigonometric polynomials. In this section we will outline Lebesgue’s elementary 
proof of Weierstrass’s result. 

To begin, a trigonometric polynomial (or, briefly, a trig polynomial) is a finite linear 

combination of the functions cos** and sin kx for k = 0 n, that is, a function of 

the form 


T(x) = oq + cosfrjt + bk sinfcx), (11.1) 

*=i 

where, for our purposes, the a k and bk are real numbers. The degree of a trig polynomial, 
as you might expect, is the order of its highest nonzero coefficient; thus, the trig polyno- 
mial T displayed above has degree exactly n if at least one of a„ or b„ is different from 0. 

Our first project is to justify the use of the word “polynomial” here by showing that 
a trigonometric polynomial is actually an algebraic polynomial (of the same degree) in 
cos* and sin*. 

Lemma 11.7. cos nx and sin(n + 1)jc/ sin jc can be written as polynomials of 
degree exactly n in cos x for any integer n > 1. 

proof. By using the recurrence formula, 

coskx + cos (A: — 2)x = 2 cos {k — l)x cos.r, 

it is easy to check that cos 2x = 2 cos 2 x — 1, cos 3* = 4 cos 3 x — 3 cos x, and 
cos4x = 8cos 4 jc - 8 cos 2 x + 1. More generally, it follows by induction that 
cosnjt is a polynomial of degree n in cosjc with leading coefficient 2" -1 . Using 
this fact and the identity 

sin(/: + l)x — sin(Jfc — l)x = 2 cos kx sinx, 

it follows (again by induction) that sin(n + 1 )x can be written as sinx times a 
polynomial of degree n in cos x with leading coefficient 2" . □ 


EXERCISES 

> 27. Let T be a trig polynomial. Prove: 

(a) If T is an odd function, then T can be written using only cosines. 

(b) If T is an even function, then T can be written using only sines. 

28. Show that there is an algebraic polynomial pit) of degree exactly 2k such that 
sin^x = p(cosx). 

▻ 29. Given a trig polynomial T(x) of degree n, show that there is an algebraic 
polynomial p(t,s ) of degree exactly n (in two variables) such that T(x) = 



Trigonometric Polynomials 


171 


p(cosx, sinx). [Hint: p(t, s ) can be chosen to be of the form q(t) + r(t)s for some 
polynomials q and r.] If T is an even function, then there is an algebraic polynomial 
pit) of degree exactly n such that T(x) = p(cos x). 

Conversely, every algebraic polynomial in cos x and sin x is also a trig polynomial 
(of the same degree). One way to see this is by induction: 

30. 

(a) Show that an algebraic polynomial in cos x and sin x can always be written using 
only functions of the form cos" x and cos'" x sin x. 

(b) Use induction to show that cos" x is a trig polynomial of degree exactly n; in 

particular, cos" x can be written as coskx, where b„ = 2~" +l . [Hint: 

2 cos a cos P = cos(a + p) + cos(a — P).] 

(c) Show that cos'" x sin x is a trig polynomial of degree exactly m + 1 . 

Our insights on trig polynomials will shed some light on the Fourier series rep- 
resentation of a continuous function. 

31. Let /:R— »-R be continuous and 2nr -periodic, and suppose that all of 
the Fourier coefficients for / vanish; that is, f* n f (x) cos nxdx = 0 and 

f* n fix) sin nxdx = 0 for all n = 0, 1 , 2 This exercise outlines a proof, 

due to Lebesgue, that / = 0. 

(a) If / (jto) = c > 0 for some point x 0 , then there exists 0 < S < n such that 
fix) > c/2 for all x with |x — xol < 6. 

(b) The functions T m (x) = [ 1 + cos(x — x 0 ) — cos S ] m , m = 1 , 2, 3, ... , satisfy 
T m (x) > 1 for |x — jco| 5 <$ and |r m (x)| < 1 elsewhere in the interval [ x 0 — n, 
*o + x ]• In fact, the sequence (T„) converges uniformly to 0 on the intervals 
[xo — 7T, xo — 6' ] and [ x 0 -I- S', Xo + n ] for any S < S' < jt. 

(c) By first taking S' sufficiently close to S and then choosing m sufficiently 
large, show that fix) T m lx)dx > Sc/2 > 0. 

(d) By showing that T m is a trig polynomial of degree m, conclude from our assump- 
tions on / that f” n fix) T m (x)dx = 0, a contradiction. 


The trig polynomials belong to the set of all 2n -periodic continuous functions 
/ : R -*■ R, a space that we will denote by C 2 *. If we write T„ to denote the collection 
of trig polynomials of degree at most n, then T n is a subspace (and even a subalgebra) 
of C 2 *. 

A bit of linear algebra will now permit us to summarize our results quite succinctly 
(giving an alternate proof to Exercise 29 while we’re at it). First, the 2n + 1 functions 
in the set 


,4 = {1, cosx, cos2x, ..., cosnx, sinx, sin2x, .... sinnx) 


are linearly independent, the easiest way to see this is to notice that we may define an 
inner product on C 2n under which these functions are orthogonal. Specifically, 


(/. 




flx)glx)dx = 0, 



flxfdx* 0 
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for any pair of functions / ^ g € A. (See Exercises 10.2 and 10.3 or Exercise 33, 
below. We will pursue this observation in greater detail later in the book.) Second, we 
have shown that each element of A lives in the space spanned by the 2n + 1 functions 
in the set 

B = { 1, cos*, cos 2 *, ..., cos' 1 *, sin*, cos* sin*, ..., cos" -1 * sin* }. 

That is, 


T n = span A C span#. 

By comparing dimensions, we have 

2/i + 1 = dim T n = dim(span A) < dim(span B) < 2n + 1 , 

and hence we must have span A = span B. The point here is that T n is a finite-dimensional 
subspace of C 2n of dimension 2n + 1 , and we may use either one of these sets of 
functions as a basis for T„. 


EXERCISES 

32. Show that the product of two trig polynomials is again a trig polynomial. Con- 
sequently, the collection of all trig polynomials is both a subspace and a subalgebra 
of C 2n . 

33. 

(a) Check that the functions 1 , cos *, sin * cos nx, sin nx are orthogonal. That 

is, show that fg = 0 for any pair of functions / ^ g from this list, and that 
f* n / 2 # 0 for any / from the list. 

(b) Conclude that the functions 1 , cos *, sin *, . . . , cos nx , sin nx are linearly inde- 
pendent (over either R or C). [Hint: Show that the coefficients in equation (11.1) 
can be uniquely determined.] 

34. Show that the functions e lkx = cos kx + / sin kx> k = — n n, are linearly 

independent (again, over either R or C). [Hint: The integral of a complex-valued 
function/ = u + /u, where w and v are real-valued, is defined as f f = f u + 1 / v.] 


An alternate approach here is to note that every trig polynomial is actually an al- 
gebraic polynomial with complex coefficients in z = e ix = cos* + i sin* and l = 
e~ ix = cos* — i sin*, that is, a linear combination of complex exponentials of the 
form 

£ c k e' k \ (11.2) 

k=—n 

where the c k are allowed to be complex numbers. We will call this form a complex trig 
polynomial (of degree n) and distinguish it from our original form by referring to that 
as a real trig polynomial. 
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Using DeMoivre’s formula (cos* + i sin*)" = cosn* + i sinn*, we can give an 
alternate proof of Lemma 1 1 .7. Indeed, notice that 

cos nx = Re [(cos* + i sin*)"] 

= Re 

I"/2| 

-E 

*=0 

where we have written i 2 sin 2 * = cos 2 * — 1 . The coefficient of cos" * on the right-hand 
side is then 




2 *-! 


(All of the binomial coefficients together sum to (I + 1)" = 2", but the even or odd 
terms, taken separately, sum to exactly half this amount since (1 + (-1))" = 0.) 
Similarly, 


sin (n + 1)* = Im [(cos* + i sin*)" +1 ] 

= Im | V. ^ ^(i sin*)*cos" +l_ *. 

[(n +l)/2 ]-l / , | \ 

- £ 


cos” u x sin*, 


where we have written (i sin*) 2 ** 1 = / (cos 2 x — 1)* sin*. The coefficient of cos” * sin* 
on the right-hand side is 


[(n+l)/2]-l 

E 


*=0 


(r + ',H £(■;') 

‘ 1 k—Q \ * / 


= 2 ". 


Obviously, every real trig polynomial can be written as a complex trig polynomial, 
since cosn* = (l/2)(e inx + and sinn* = (1/2 i)(e mx — e~ tnx \ but notice that, 
in general, we must use complex coefficients q to represent real trig polynomials. 
Conversely, every complex trig polynomial can be written as a linear combination of 
sines and cosines but, again, typically with complex coefficients. 

The point here is that only certain complex trig polynomials represent real-valued 
functions. Indeed, the real trig polynomials correspond to the real parts of the complex 
trig polynomials. To see this, notice that equation (11.2) represents a real-valued function 
if and only if 


£ c k e ikx = J2 c « e ‘ kx = E d ~ ke ' kX ’ 

k—-n k=-n k=-rt 
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that is, c* = c_* for each k. In particular, co must be real, and hence 

c 0 + + c -t e ~‘ kx ) 

k= I 

Co + 'Yjc k e ikx + c k e~ ikx ) 

4=1 
n 

co + X>*+ c*)cos kx + i(c k - c*)sin*x ] 

*=i 

ft 

c 0 + X;[ 2Re(c* ) cos kx — 2Im(c* ) sin kx ] , 

k=\ 

which is of the form (11.1) with a k and b k real. 

Conversely, given any real trig polynomial (1 1.1), we have 

n 

a 0 + ( a * cos kx + b k sin kx ) 

4=1 

, T { a k ~~ ibk \ ikx , { <*k + ibk \ _ ikx "1 

=ao+ SK~) c + {—r \- 

which is of the form (1 1.2) with c k = c_* for each k. 

The real trig polynomials of degree n are the real linear span of the functions 
1, cosjc, sinjt, . . . , cos/ijc, sin/ur, and hence form a vector space of dimension In + 1 
over R. The complex trig polynomials of degree n are the complex linear span of 
1, cosx, sinx, . . . , cosnx, sinnx, and so form a vector space of dimension In + 1 over 
C, or of dimension 2(2n + 1) over R. Obviously, if we want to restrict our attention to 
real-valued functions, we want only “half’ of the complex trig polynomials. 

Now we are ready to talk about approximating a continuous function by a trig 
polynomial. (Henceforth, “trig polynomial” means “real trig polynomial”) Since each 
trig polynomial is periodic with period 2n , though, we would only expect to approximate 
functions that were likewise periodic with period 2 n. In fact, it is easy to see that even 
the pointwise limit of a sequence of periodic functions is again periodic, and so the 
same will be true for uniform limits. 

Each / € C 2 * is completely determined by its values on, say, [-n, n ], and so we 
can norm C 2 ” by setting ||/||oo = max^* |/(jc)|. Please note that each element of C 2 * 
is necessarily uniformly continuous on R. (Why?) 

Weierstrass’s Second Theorem 11.8. Given f e C 2 * and e > 0, there is a 
trig polynomial T such that ||/ — THoo < e. Hence, there is a sequence of trig 
polynomials (T n ) such that T„^f on R. 

We will show that Weierstrass’s second theorem follows from his first (Theo- 
rem 1 1 .3). To begin, we need a simple lemma. 

Lemma 11.9. Given an even function f e C 2 * and e > 0, there is an even trig 
polynomial T such that || / — T||oo < £■ 
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proof. The simple trick here is to note that giy ) = / (arccos y) defines a continu- 
ous function for — 1 < y < 1 . Thus, by Theorem 1 1 .3, there is an algebraic 
polynomial p such that maX| V |<i |/(arccosy) - p(y)\ < e. But then, T(x) = 
p(cosx) is an even trig polynomial, and, clearly, maxo< x <* \f(x) - p(cos*)| < e. 
Since / is even, it follows that \\f — THoo < e. □ 

The rest of the proof of Weierstruss’s second theorem consists of several clever 
applications of Lemma 1 1 .9. 

proof. Given / € C 2 ", note that both of the functions 

fix) + /(-*), and [f(x) - /(-*)] sin x 

are even. Thus, from Lemma 1 1 .9, there are even trig polynomials T\ and Ti such 
that 

f(x) + f(-x) = T\ix) + d\(x) and [fix)- fi-x)]smx = T 2 ix) + d 2 ix), 

where ||d| ||oo < e/4 and Halloo < e/4. By multiplying the first equation by sin 2 x, 
the second by sin*, and adding the results, we get 

/(*) sin 2 x = T 3 ix) + d 3 ix), ( 1 1 .3) 

where T 3 is a trig polynomial and Halloo < e/2. But since this is true for any 
/ € C 2 *, it must also hold for the function fix - jt/ 2); in other words, we 
also have fix - ;r/2)sin 2 x = T^(x) + d A {x), where T* is a trig polynomial and 
Halloo < e/2. Thus, after replacing x by x 4- jt/2, we have 

/(*) cos 2 x = T s [x) + d s ix), ( 1 1 .4) 

where T 5 is a trig polynomial and ||dj||oo < e/2. Finally, adding equations (1 1.3) 
and (11.4), 


fix) = r 6 ( x) + <u,{x), 

where T 6 is a trig polynomial and Halloo < e. That is, \\f — Talloo < e. □ 

To round off our discussion of Weierstrass’s second theorem, we next show that 
Theorem 1 1.8 implies Theorem 1 1.3.ByLemma 1 1.1, it is enough to show that Theorem 
11.3 holds in, say, C[— 1, 1 ]. But, given / € C[— 1, 1 ], note that /( cosx) € C[0, n ]. 
In fact, / (cos x) defines an even function in C 2 ". Thus, by Theorem 11.8, there is a trig 
polynomial T such that |/(cosx) — T(a:)| < s for all x € R. Then, since /(cos*) is even, 
it follows that | /(cos*)— T(— x)| < e for all* e R, too. Hence, the even trig polynomial 
g(x) = ( r(*) + T(— *)]/ 2 likewise satisfies |/(cos*) — g(*)| < e for all* € R. (Why?) 
Finally, from Exercise 29, there is an algebraic polynomial p such that g(x) = p( cos*). 
That is, |/(cos*) - p(cos*)| < e for all * € R, and hence |/(f) — pit) \ < e for all 
t € [—1, 1 ]. 

The conclusion here is that Weierstrass’s two theorems are logically equivalent. This 
observation may seem pointless; after all, we used Theorem 1 1.3 to prove Theorem 
1 1.8. But there are many independent proofs of Weierstrass’s two theorems. The real 
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point here is that it is necessary only to prove one of the two; the other will follow from 
elementary arguments. We will find plenty of applications of Weierstrass’s approxima- 
tion theorems in Part Three. 


EXERCISES 

35. Prove that C 2 * is complete. 

36. Prove that C 2n is separable. 

37. Let / be Riemann integrable on [— n, tt ], and let s > 0. Prove: 

(a) There is a function g e C[—7T, tt ] satisfying f** \f(x) — g(jt)| dx < e. 

(b) There is a continuous. In -periodic function h € C 2 * satisfying f* n \f(x) — 

dx < e. 

(c) There is a trig polynomial T with f* n \f(x) — T(x)\ dx < e. 

38. Show that each element of C 2 * is uniquely determined by its Fourier series. That 
is, show that if /eC^.an dif f(x)cosnx d : t = 0, and f* n f(x) sin nx dx =0 
for all n = 0, 1,2,..., then / = 0. [Hint: For an easy proof, modify the argument 
used in Application 1 1.6.] 

39. Let / e C 2n . If the Fourier series for / is uniformly convergent on R, prove 
that it must, in fact, converge to /. [Hint: Combine the arguments of Example 10.6 
and the previous exercise.] 

40. If / : R R is twice continuously differentiable and 27r-periodic, prove that 
the Fourier series for / converges uniformly to /. [Hint: See Exercise 10.4.] 


Infinitely Differentiable Functions 

The value in approximating by algebraic or trigonometric polynomials should be obvi- 
ous: Polynomials are well behaved. Either type of polynomial is not only continuous, 
but differentiable. In fact, either sort of polynomial has continuous derivatives of all 
orders; in other words, they are infinitely differentiable. Thus, while the typical function 
in C[ 0, 1 ] or C 2 * may not be differentiable at any point, it is nevertheless close to one 
that is infinitely differentiable. Our goal in this section is to show how this result extends 
to C(R). The Weierstrass theorem will do most of the work for us; all that is lacking 
is a method for constructing infinitely differentiable functions with certain prescribed 
properties. 

The class of infinitely differentiable functions / : R R is denoted by C°°(R). That 
is, / € C°°(R) if and only if / has continuous derivatives of all orders on R. Obviously, 
C°°(R) is both a subspace and a subalgebra of C(R). 

Lemma 11.10. Thereisan f e C°°(R ) such that f (x) = Oforx < 0 and f(x) > 0 
for x > 0. 
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proof. Define / by f(x) = 0 for x < 0 and f(x) = e~ x,x for x > 0. It is clear 
that / is infinitely differentiable everywhere except, possibly, at x = 0. Notice 
that /'(jc) = x~ 2 e~ x ^ x and f"(x) = (x _4 -2x _3 )e _l/l for* > 0. Using induction, 
it is easy to see that f (k) (x) = Pk(x~ l )f(x) for x > 0, where p* is a polynomial 
of degree at most 2k. Of course, f (k) ix) = 0 for x <0 and any k. 

To see that / is continuous at 0, first note that if y > 0, then e y = • > 

y m /m ! for any m = 0, 1 , 2 Thus, for x > 0, 

0 < f(x) = e~ l/x = (e 1/x ) _l < mix'", 

for m = 0, 1, 2 In particular, fix) -*■ 0 as x -*■ 0. Likewise, fix)/x -*■ 0 as 

x -*■ 0. That is, / ' exists and is continuous at x = 0, and / '(0) = 0. 

Suppose that we have shown that f {k) exists and is continuous at 0. Then, of 
course, f {k) i0) = 0. Thus, ff k \x)/x = x~ l pkix~*)fix). And since pk has degree 
at most 2k, and since |/(x)| < ilk + l)\x 2k+1 , it follows that f (k \x)/x -»• 0 as 
x -* 0. That is, / ( * +,, (0) exists and equals 0. A similar argument shows that 
f ik+u ix) = Pk+\{x)fix) 0 as x 0; that is, / ( * +l) is continuous at 0. By 
induction, f ik) exists and is continuous at 0 for all k. □ 

The function / constructed in Lemma 11.10 is an important example. All of the 
derivatives of / vanish at 0, but / is not identically 0. Thus, the Taylor series expansion 
for / about 0 does not converge to /. In fact, no convergent power series a n x " 
can represent / in any neighborhood of 0. 

Given /, it is easy to construct all sorts of C°° functions: 

Lemma 11.11. Thereisag e C 00 (R) such that gix) = 0for\x\ > 1 and gix) > 0 
for 1x1 < 1. 

proof. Let g(x) = fix + 1)/(1 - jc), where / is the function constructed in 
Lemma 1 1.10. □ 

Lemma 11.12. There is an h e C^CR) such that 

(i) hix) = 0 for |x| > 1, 0 < hix) < 1 for |x| < 1, and hi 0) = 1; 

(ii) Given n e Z and n < x < n + \, we have hix — n) + hix — n — 1) = 1, while 
hix — k) = 0 for any integer k<n or k>n + 1. 

proof. Let g be the function constructed in Lemma 11.11, and consider the 
function Gix) = £ n€Z g(* - «)• This series is actually a finite sum in a small 
neighborhood about any point x e R. Indeed, if n 6 Z is chosen so that n - 1 
< x < n + 1, then at most three terms, namely gix — n + 1), gix - n), and 
gix — n — 1), are nonzero (and at least one is strictly positive). That is, Gix) = 
gix — n + 1) + gix — n) 4- gix — n — 1) on n — 1 < x <n + l (and Gix) = 
gix -n)+ gix — n — l)ifn<x<n + l). Consequently, the series converges to an 
infinitely differentiable function G(x) and, moreover, G(x) > 0 for any x. Finally, 
if we set hix) = gix)/ Gix), then it is easy to check that h has the properties stated 
in the lemma. □ 
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Now let’s bring the Weierstrass theorem back into the picture. 

Theorem 11.13. Given f € C(R) and e > 0, there is a function <p € C°°(R) such 
that | f(x) - <p(x ) | < £ for all x € R. Hence, there is a sequence (<p„) in C°°(R) 
such that <p„=z f on R. 

proof. For each n € Z, Theorem 1 1.3 supplies a polynomial p„ such that |/(jc)- 
p„(x) | < £ for all n — 1 < x < n + l. Now define <p by <p(x) = £ n€Z p„(x)h(x — n), 
where h is the function constructed in Lemma 11.12. This series is actually a finite 
sum over any bounded interval, so <p € C°°(R). And, from Lemma 11.12 (ii), if 
n < x < n + 1, then 

<P(x ) = p„(x)h(x -n) + p n+ i(x)h(x - n - 1). 

Thus, for n < x < n + 1, we get 

\f(x) - <p{x)\ = |A(* - n)[f(x) - p n (x)] 4- h(x — n — 1)[/(jc) - p„ + iU)]| 

< h(x - n)\f{x ) - p„(.r)| + h(x-n- l)|/(x) - p n+ i(A:)| 

< e, 

since h > 0 and h(x — n) + h(x — n — 1 ) = 1 . □ 


EXERCISES 

41. Given a < b, modify the construction in Lemma 11.11 to find a function 
tp € C°°(R) with <p(x ) = 0 for x (a, b) and <p{x) > 0 for x € (a, b). 

42. Given a < b, show that there is an ^ € C°°(R) such that ’4'{x) = 0 for 

x < a, 0 < < 1 for a < x < b, and i/r( x) = 1 for x > b. [Hint: Consider 

\/f(x) = c f* <p , where <p is as in Exercise 41.] 

43. Given a < b and e > 0, show that there is a function <p € C°°(R) such that 
<p{ x) = 0 for x ^ [a — e,b + b\, <p(x) = 1 for x € [a, b], and 0 < <p(x) < 1 
otherwise. 

>44. Let h be the function constructed in Lemma 1 1.12. Given any integer n € Z 
and any positive integer k € N, show that h{x — i) = 1 for n < x < n+k. 


Equlcontinuity 

We next turn our attention to the second question raised at the beginning of the chapter: 
Given a compact metric space X, what are the compact subsets of C(X)? Since C(X) is 
complete, we know that this is the same as asking: What are the totally bounded subsets 
of C(X)? (Because the compact sets in C(X) are just the closures of the totally bounded 
sets.) If we recall the Bolzano-Weierstrass characterization of total boundedness, we 
can rephrase the question yet again to read: When does a (uniformly) bounded sequence 
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in C'(A') have a (uniformly) convergent subsequence? We will see in this section that 
this last question is asking for the missing ingredient in the formula 


pointwise convergence + 


??? 


uniform convergence. 


To begin, let’s make a few easy observations. Recall that, throughout this chapter, unless 
otherwise specified, X denotes a compact metric space. 


Examples 11.14 

(a) If (/„) is a uniformly convergent sequence in C(X), and if /„ =3 / on X, then 
the set {/} U {/„ : n > 1 } is compact in C(X). (Why?) 

(b) A collection of real-valued functions T on (a set) X is said to be uniformly 
bounded if the set { f(x ) : x e X, f € !F) is bounded (in R), that is, if 
su P/€^ su Pxex 1/001 = s up/ ej r H/lloo < oo. In other words, uniformly bounded 
means bounded in the metric of B(X) (or C(X)). Clearly, any uniformly conver- 
gent sequence in B(X) is uniformly bounded. 

The point to Example 1 1 . 1 4 (a) is that we already know some easy compact subsets of 
C(X), and Example 1 1 . 14 (b) is reminding us that boundedness is a necessary condition 
for compactness (or total boundedness). But, as you might suspect, a totally bounded 
set should be something more than merely bounded. The extra ingredient here is called 

equicontinuity. 

Let T be a collection of real-valued continuous functions on a metric space X. If, 
given e > 0 , a single 8 can always be chosen to “work” (in the e-8 definition of conti- 
nuity) simultaneously for every / e T and every x e X, then T is called equicontinu- 
ous (or, sometimes, uniformly equicontinuous). That is, T is equicontinuous if, given 
e > 0, there is a 8 > 0 such that whenever x, y € X satisfy d{x, y) < 8, we then have 
|/(x) - /(y)| < £ for all / € T. In short, an equicontinuous collection of functions is 
“uniformly uniformly continuous.” 

Examples 11.15 

(a) Clearly, any finite subset of C(X) is equicontinuous. (Why?) Also note that any 
subset of an equicontinuous set of functions is again equicontinuous. 

(b) Given 0 < K < oo and 0 < a < 1, recall that Lip*, a is the collection of all 
/ 6 C[0, 1 ] that satisfy |/(x) - /(y)| < K\x - y|" for x, y e [0, 1 ]. It is easy 
to see that Lip^or is equicontinuous. (Why?) But Lip^or is not totally bounded, 
since it is not bounded in C[0, 1 ] (it always contains the constant functions). 


EXERCISES 

45. A collection of real-valued functions T on (a set) X is said to be pointwise 
bounded if, for each x € X, the set {fix) : / 6 T\ is bounded (in R), that is, if 
su P/€^ I /Ml < oo for each x € X. If (/„) is a pointwise convergent sequence of 
real-valued functions, show that (/„) is also pointwise bounded. 
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46. Prove that a uniformly bounded collection of functions is also pointwise 
bounded. Give an example of a collection of functions that is pointwise bounded 
but not uniformly bounded. 

47. If a sequence (/„) in B[a> b ] is pointwise bounded, show that some sub- 
sequence of (/„) converges pointwise on the set of rationals in [a y b]. [Hint: 
Diagonalize!] 

48. Let X be a compact metric space. Prove that an equicontinuous subset of C(X) 
is pointwise bounded if and only if it is uniformly bounded. 

49. A collection T of real-valued continuous functions on a metric space X is said 
to be equicontinuous at a points € X if, for each £ > 0, there is a single S > Othat 
“works” at x for every / € T. That is, T is equicontinuous at jc if, given e > 0, there 
is a S > 0, which may depend on jc, such that whenever y € X satisfies d(x , y) < S 
then |/(jc) — / (y)\ < e for all / e T. If X is a compact metric space, prove that a 
subset of C(X ) is equicontinuous if and only if it is equicontinuous at each point of 
X. 

50. Show that a bounded subset of C (1) [ a, b ] is equicontinuous. 

>51. Let X be a compact metric space, and let (/„) be a sequence in C(X). If (/„) 
is uniformly convergent, show that (/„) is both uniformly bounded and equicon- 
tinuous. 

>52. Let X be a compact metric space, and let (/„) be an equicontinuous sequence 
in C(X). If ( f n ) is pointwise convergent, prove that, in fact, (/„) is uniformly 
convergent. 

53. Let X be a compact metric space, and let (/*) be a sequence in C(X). If 
(f n ) decreases pointwise to 0, show that (/„) is equicontinuous. [Hint: Exercise 49.] 
Combine this observation with the result in Exercise 52 to give another proof of Dini’s 
theorem (Exercise 10.18). 

54. Let X be a compact metric space, and let (/„) be an equicontinuous sequence 
in C(X). Show that C = {jc e X : (/„(*)) converges} is a closed set in X. 

55. If (f n ) is an equicontinuous sequence in C[ a, b ], and if (f n ( x)) converges at 
each rational in [a,b] y prove that (/„) is uniformly convergent on [ a y b ]. [Hint: 
Exercises 54 and 52.] 

56. (Arzeli-Ascoli, utility grade): If (f„) is an equicontinuous, pointwise bounded 
sequence in C[a y b ], then some subsequence of (/„) converges uniformly on [ a, b ]. 
[Hint: Exercises 47 and 55.] 


Lemma 11.16. If T is a totally bounded subset of C(X\ then T is uniformly 
bounded and equicontinuous. 

proof. Since a totally bounded set is necessarily also (uniformly) bounded, we 
only have to prove that T is equicontinuous. So, let e > 0. 
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Since T is totally bounded, it has a finite e/3-net; that is, there exist /j, . . . , 
such that each / € T satisfies \\f - /||oo < e/3 for some i. Since the 

set [fi /„} is equicontinuous, there is a 8 > 0 such that |/(jr) - f t (y)\ < e/3 

whenever d(x, y) < 8. We now claim that this same <5 “works” for every / € T. 
Indeed, given / € T, first choose /' such that ||/ - /||oo < e/3. Then, given x 
and y with d(x, y) < 8, we have 

1/00 - f(y)\ < I m - Mx ) I + \Mx) - My ) I + 1 My) - f(y ) I 
< e/3 + e/3 + e/3 = e. 

Thus, T is equicontinuous. □ 


Corollary 11.17. If (/„) is a uniformly convergent sequence in C(X), then (/„) 
is uniformly bounded and equicontinuous. 

Lemma 1 1.16 essentially characterizes the compact subsets of C(X). 

The Arzelk-Ascoli Theorem 11.18. Let X be a compact metric space, and let 
T be a subset of C(X). Then T is compact if and only if T is closed, uniformly 
bounded, and equicontinuous. 

proof. The forward implication follows from Lemma 11.16; that is, a compact 
subset of C(X) is necessarily closed, uniformly bounded, and equicontinuous. We 
need to prove the backward implication. So, suppose that T is closed, uniformly 
bounded, and equicontinuous, and let (/„) be a sequence in T. We need to show 
that (/„) has a uniformly convergent subsequence. 

First note that (/„) is equicontinuous. (Why?) Thus, given e > 0, there is a 
5 > 0 such that if d(x, y) < 8, then |/„(jr) — f H (y) I < e/3 for all n. 

Next, since X is totally bounded, X has a finite <5-net; there exist jcj x k e X 

such that each x e X satisfies d(x,x,) < 8 for some i. Now, since (/„) is also 
uniformly bounded (why?), each of the sequences (/„(*, ))“ , is bounded (in R) 

for » = 1 k. Thus, by passing to a subsequence of the /„ (and relabeling), 

we may suppose that (/„(.*,))“, converges for each i = 1 ..... A. (How?) In 
particular, we can find some N such that |/ m (*,)- /„(**) | < e/3 for any m,n > N 
and any i = 1, . . . , k. 

And now we are done! Given x e X, first find i such that d(x, x,) < 8, and 
then, whenever m,n> N, we will have 

1/mW - Ux)\ < \f m (x) - f m (Xi ) I + I fmixt) ~ f n (x,)\ + |/ n U) - f„{x)\ 

< e/3 + e/3 + e/3 = e. 

That is, (/„) is uniformly Cauchy, since our choice of N does not depend on 
x. Since T is closed in C(X), it follows that (/„) converges uniformly to some 
feF. □ 

Compare the following result to Exercise 56. 
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Corollary 1 1.19. Let X be a compact metric space. If(f„ ) is a uniformly bounded , 
equicontinuous sequence in C(X), then some subsequence of (/„) converges uni- 
formly on X. 


EXERCISES 

57. Suppose that f n : [ a, b ] R is a sequence of differentiable functions satis- 
fying |/„'(*)l < 1 for all n and x. Prove that some subsequence of ( f n ) is uniformly 
convergent. 

58. For K and a fixed, show that {/ e Lip* or : /( 0) = 0} is a compact subset of 
C[0,1]. 

59. For each n, show that {/ e Lipl : ||/||upi < n] is a compact subset of 
C[ 0, 1 ]. Use this to give another proof that C[ 0, 1 ] is separable. [Hint: See Exer- 
cises 24 and 26.] 

60. If ( f n ) is an equicontinuous sequence in C (1) [ a, b ], is it necessarily true that 
the sequence of derivatives (/„') is uniformly bounded? Explain. 

61. For the sake of a characterization that is easier to test, it is convenient to weaken 
one of the hypotheses in the Arzel^-Ascoli theorem. Given a compact metric space X 
and a subset T of C(X), prove that T is compact if and only if T is closed, pointwise 
bounded , and equicontinuous. [Hint: Just repeat the proof of Theorem 11.18!] 

62. Let X be a compact metric space, and let T be a subset of C(X). 

(a) If T is pointwise bounded, prove that the closure of T in C(X) is also pointwise 
bounded. 

(b) If T is uniformly bounded, prove that the closure of T in C(X) is also uniformly 
bounded. 

(c) True or false? If T is equicontinuous, then the closure of T in C(X) is also 
equicontinuous. 

63. Define T : C[a y b] C[a y b] by (Tf)(x) = f* /. Show that T maps 
bounded sets into equicontinuous (and hence compact) sets. [Hint: Tf is Lipschitz 
with constant ||/||oo l 

64. Let (f n ) be a sequence in C[a,b] with ||/ n ||oo < 1 for all n and define 
F„(x) = f* f n (t)dt. Show that some subsequence of (F n ) is uniformly con- 
vergent. 

65. Let K(x y t) be a continuous function on the square [a, b ] x [a, b], 

(a) Given / e C[a y b ], show that g(;t) = f(t) K(x y t)dt defines a continuous 
function g € C[a, b ]. 

(b) Define T : C[a,b] -* C[a,b] by ( Tf)(x ) = /* f(t)K(x,t)dt. Show 
that T maps bounded sets into equicontinuous sets. In particular, T is conti- 
nuous. 

66. Suppose that F : R 2 — ► R is continuous and Lipschitz in its second variable: 
\F(r 9 s)-F(r 9 t)\<K\s-t\. 
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(a) If / € C[ a, b ], show that g(x) = f* F(t, f (t)) dt defines a continuous func- 
tion g € C[ a, b ]. [Hint: F is bounded on rectangles.] 

(b) Define T : C[a,b] -*• C[a,b] by (Tf)(x) = f* F(t, f(t))dt. Show that T 
is continuous. [Hint: T is not linear, but it is Lipschitz.] Consequently, T achieves 
a minimum on any compact set in C[a,b]. 

(c) Show that T maps bounded sets into equicontinuous sets. [Hint: Estimate the 
Lipschitz constant of T f.) 


Continuity and Category 

In Chapter Ten we gave examples showing that the pointwise limit of a sequence of 
continuous functions need not be everywhere continuous. And, in general, we know that 
some extra ingredient is needed to ensure such a strong conclusion. But is it possible 
that the pointwise limit of a sequence of continuous functions could be everywhere 
discontinuous? For example, is it possible to express Xq as the pointwise limit of a 
sequence of continuous functions on R? 

As it happens, the pointwise limit of a sequence of continuous functions on R must 
have lots of points of continuity. 

The Baire-Osgood Theorem 11.20. Let f„ : R -*• R be continuous for each n, 
and suppose that f(x) = lim,,-.^ f„(x) exists (as a real number) for each x e R. 
Then D(f) is a first category set in R. In particular, f is continuous at a dense 
set of points in R. 

proof. From Theorem 9.2 we know that D(f) = L£L|{* : <*>/(*) > 1/n] is the 
countable union of closed sets. Thus, it suffices to show, for any e > 0, that the 
closed set £ = [x : (o f (x) > 5e} is nowhere dense. The proof of this fact may 
seem rather indirect, but have patience! 

Consider the sets £„ = r)i,;>»(x : I /»(•*) - fj(x) \ < e}. Since (/„) is pointwise 
convergent, we know that (J“ , F„ = R. Notice, too, that each £„ is a closed set 
(because the f are continuous). 

Given any closed interval /, we want to show that 1 <£ F, for then it will 
follow that £ contains no open intervals either (that is, £ has an empty interior). 
We will take a first step in this direction by applying the Baire category theorem 
to / = U£t,(£„ n /)• Since / is complete, and since each £„ is closed, it follows 
that, for some n, the set £„ n / contains an entire open interval J. We are going 
to show that J c £ c = {* : co/(x) < 5e}, and hence that l <t F. 

Since J C £„, we have | f(x) — ffix) \ < e for all x e J and all j > n. Thus, 
|/(jc)— /„(jc)| < e for all x e J. (Why?) Next we use the fact that /„ is continuous: 
For each x 0 e J there is an open interval l Xo c J, containing jco, such that 
|/ rt (;c) - /„(x 0 )| < e for all x e /,„. But then it follows from the triangle inequality 
that |/(jc) - /„(a:o)| < 2e for all x e l Xo and, finally, that | f(x) - f(y ) | < 4e for 
all x, y € l Xo . That is, we have shown that co/(x 0 ) < co(f \ I Xo ) < 4e, and hence 
that xo i F. □ 
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Corollary 11.21. Let f : R -*• R. Then, D(f ) is a first category set in R if and 

only if f is continuous at a dense set of points. 

proof. An F„ subset of R is a first category set if and only if its complement is 

dense. □ 

Examples 11.22 

(a) Xq cannot be written as the limit of a sequence of continuous functions. (Why?) 
However, we do have Xq(x) = lim m _ 0O lim,,-^ (cosm! jtx) 2 ". 

(b) If / : R -*■ R is everywhere differentiable, then /' must have a point of 
continuity, since / ' is then the limit of a sequence of continuous functions: 
f'{x) = lim n _oofl [/(x + (1/n)) — /(*)]. 

Since the subject of derivatives has come up in conjunction with the Baire category 
theorem, now is probably a good time to discuss Banach’s proof of the existence of 
continuous nowhere differentiable functions. Rather than pursue the “hard” technical- 
ities that we saw in Chapter Ten, we will take this as an excuse to demonstrate some of 
the advantages of the “soft” approach. 

To begin, let F denote the set of all functions in C[ 0, 1 ] having a finite derivative at 
some point of [ 0, 1 ]. Banach’s wonderfully clever observation is that F is a first category 
set in (the complete space) C[ 0, 1 ]. Since this means that the complement of F is dense 
in C[ 0, 1 ], it would be fair to say that “most” continuous functions on [ 0, 1 ] fail to 
have a finite derivative at even a single point. Isn’t this curious? Without displaying 
a single concrete example, Banach’s observation shows that nondifferentiability is the 
rule, rather than the exception, for elements of C[0, 1 ]. 

For each n > 2, consider the set E„ consisting of those / e C[0, 1 ] such that, for 
some 0 < x < 1 — (1/n), we have | /(x + h) — /(x)| < nh for all 0 < h < 1 — x. In 
particular, any / e C[ 0, 1 ] having a right-hand derivative at most n in magnitude at 
even one point in [ 0, 1 - ( 1 /n) ] is in E „ . The set E = (J^l 2 E„ consists of all of those 
/ € C[ 0, 1 ] that have bounded right-hand difference quotients at some x in [0, 1). In 
particular, any / € C[0, 1 ] having a finite right-hand derivative at even one point in 
[0, 1) is in E. We will show that £ is a first category set in C[0, 1 ] by showing that 
each E„ is closed and nowhere dense in C[ 0, 1 ]. 

First, let’s show that the complement of E„ is dense in C[0, 1 ]. Once we have 
established that E„ is closed, this will prove that E„ is nowhere dense. Given e > 0, 
we need to show that an arbitrary g e C[0, 1 J is within e of some f E„. Since the 
polygonal functions are dense in C[0, 1 ], it is enough to consider the case where g 
is polygonal. But now our job is easy: We just argue that we can find a “sawtooth” 
function /, having right-hand derivatives bigger than n in magnitude, that is within e 
of g , as shown in Figure 11.3. 

Next, let’s check that E„ is closed. Suppose that (/*) is a sequence from E„, and that 
(/*) converges uniformly to some / in C[ 0, 1 ]. We need to show that / e £„. Now there 
is a corresponding sequence (x*) with 0 < x* < 1— (1/n) such that |/*(x*+/j)— /( x*)l < 
nh for all 0 < h < 1 - x*. By passing to a subsequence, if necessary (and relabeling), 
we may suppose that x* — *■ x, where 0 < x < 1 — (1/n). We will take the corresponding 
subsequence of (/*), too (likewise relabeled). Thus, /* =t f and x* -► x. 
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(a) 



(b) 


Figure 

11.3 


If 0 < h < 1 — x, then 0 < h < 1 -x k for all A: sufficiently large. Thus, if/: sufficiently 
large, we have 

l/C* + h)- /(x)l < l/C* + h) - f{x k + /i)| + | f(x k +h)~ f k (x k + /»)| 

+ \fk(x k +h)~ f k (x k ) | + | /*(■**) - /(^*)l + \f(x k ) - /C*)| 

< | f(x + h)~ f{x k + *)| + ||/ - /tlloo 
+ nh + ||/ - /tlU + |/0*t) - /(-*)!• 

Now, since / is continuous and f k =f /, we just let k -> oo in our last estimate to arrive 
at |/(jc + h) - f(x)\ < nh. That is, / e E n . 


Notes and Remarks 

Weierstrass’s first theorem, on approximation by algebraic polynomials (Theo- 
rem 11.3), appeared in Weierstrass [1885, pp. 633-639]. His second theorem, on 
approximation by trigonometric polynomials (Theorem 1 1 .8), appeared immediately 
after the first, in a paper under the same title, in Weierstrass [1885, pp. 789-805]. See 
Weierstrass [1886] for a French translation. 

A great deal has been written about Weierstrass’s approximation theorems and related 
questions. For a brief historical overview, see Shields [1987a] and Hedrick [1927]. More 
detailed discussions are given in Jackson [1920] and Fisher [1978]. For a short account 
of Weierstrass’s life, see Polubarinova-Kochina [1966]. 

Three highly readable sources for detailed information on the approximation of 
functions are Natanson [1964], Cheney [1966], and Rivlin [1981]. 

The observation that the polygonal functions are dense in C[a,b] (Theorem 1 1.2) 
is due to Lebesgue, as is the fact that this observation can be used to give an elementary 
proof of Weierstrass’s first theorem (see Exercises 2 and 11). So is the elementary 
proof that Weierstrass’s two theorems are, in fact, equivalent (the proof of Theorem 
1 1.8 and the subsequent discussion). All this and more can be found in Lebesgue’s first 
published paper, Lebesgue [1898]. The details, as given here, are based largely on the 
presentation in de la Vall6e Poussin [1919]. 
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Sergei Bernstein’s proof of the Weierstrass theorem (Theorem 11.4) is from 
S. N. Bernstein [1912]. The curious fact that the proof of Bernstein’s theorem rests on 
checkingjust three special cases, the polynomials fo(x) = 1, /i(.t) = x,and fiix) = x 2 , 
leads to a beautiful result of Korovkin on monotone (or positive) linear operators on 
C[a,h]. (A linear map T : Cfa, 6] -► C[a, 61 is monotone if T(/) < T(g) whenever 
/ < £•) Korovkin’s theorem states that if any sequence (T n ) of monotone linear maps 
on C[ a, b ] satisfies T„(f) =t / in each of the three cases / = f 0 , f = f u and / = fo, 
then T„(f) =t f for every f e C[ a, b ]. Since the operators B n (f) are linear and posi- 
tive (see Exercise 5), Bernstein’s theorem is a special case of Korovkin’s result. There 
is also a version of Korovkin’s theorem for monotone linear maps on C 2,r , in which 
case the “Korovkin set” { 1 , x, x 2 } now becomes (1, cosjt, sin jc). For more details, see 
Cheney [1966], or Korovkin [I960]. For more recent developments along these lines, 
see Donner [1982]. 

Exercise 16 is taken from my classroom notes from W. B. Johnson’s course in real 
analysis at The Ohio State University in 1974-75. The spaces Lip a, for 0 < or < 1, in 
Exercises 20-24, 26 are sometimes referred to as the Holder continuous functions. 

The section on trigonometric polynomials, along with the proof of the equivalence 
of Weierstrass’s first and second theorems, is based in part on the presentations found in 
de la Vall6e Poussin [1919] and Natanson [1964] (and, to some extent, Jackson [1941] 
and Rogosinski [1950]) but, as already mentioned, is heavily influenced by Lebesgue’s 
original presentation; see also Lebesgue [1906]. 

Several enlightening proofs of the Weierstrass theorems (especially, deductions of 
the first theorem from the second) can be found in Jackson [1941], In one particularly 
direct approach, Jackson points out that if / is a polygonal function in C 2 ” , then the 
Fourier coefficients for / satisfy |a*|, |6*| < C/k 2 . (Compare this with the result in Exer- 
cise 40.) It follows (see Exercise 39) that each 2 tt- periodic polygonal function is the 
uniform limit of its Fourier series. Since the polygonal functions are clearly dense in 
C 2t , this observation gives a quick proof of Weierstrass’s second theorem. 

The constructions in Lemmas 1 1.10 and 11.11, along with Exercise 42, are based 
on the presentation in Beals [1973]. Lemma 11.12, Theorem 11.13, and Exercise 44 
are based on the presentation in Pursell [1967]. 

The Italian mathematicians Ascoli and Arzelit were both interested in extending 
Cantor’s set theory to sets whose elements were functions, sometimes referred to as 
“curves” or “lines,” especially in regard to “functions of lines,” or functions of func- 
tions, if you will. In particular, Arzelsk examined the problems of finding necessary 
and sufficient conditions for the integrability of the pointwise limit of a sequence of 
integrable functions, of finding the correct mode of convergence that would preserve 
integrability, and of the validity of term-by-term integration of series. 

Ascoli defined the notion of equicontinuity (at a point), and Arzelk used the concept 
at about the same time. It would seem that Ascoli proved the sufficiency of this new 
condition for compactness in Ascoli [ 1 883] while Arzelk proved the necessity in Arzeli 
[1889] (for C[0, 1 ] in either case). But Arzela is generally credited for the first clear 
statement of Theorem 1 1.18 for C[0, 1 ] in ArzelH [1895]. The metric space version 
is (once again) due to Fr6chet; see Fr6chet [1906]. For more details, see Dunford and 
Schwartz [1958] and Hawkins [1970]. Exercise 59 is based on a result in Dudley [1989]. 
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A slightly different version of Theorem 1 1 .20, concerning the set of points of uni- 
form convergence of a pointwise convergent sequence of functions, was established 
in Osgood [1897]. For more on Osgood’s approach, see Hobson [1927, Vol. II], As 
stated here, Theorem 1 1 .20 is part of Baire’s thesis, Baire [ 1 899]. The proof given here, 
along with Corollary 1 1.21 and Example 1 1.22, are taken from Oxtoby [1971]. For a 
discussion of related issues, see Hewitt [1960], Goffman [1960], and Myerson [1991]. 

Banach’s clever application of the Baire category theorem to prove the existence 
of continuous nowhere differentiable functions is from Banach [1931]. The proof pre- 
sented here is taken from Oxtoby [1971] (but see also Boas [I960]). Applications of 
the Baire category theorem to existence proofs are numerous; both Oxtoby and Boas 
provide several other curious examples. Two particular examples, though, are sim- 
ply too curious to avoid mention. Compare “Most monotone functions are singular," 
Zamfirescu [1981] and “Most monotone functions are not singular,” Cater [1982], 
Katsuura [1991] offers an intriguing application of Banach’s contraction mapping the- 
orem to address the existence of nowhere differentiable functions. 
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Algebras and Lattices 

We continue with our study of B(X), the space of bounded real-valued functions on a 
set X. As we have seen, B(X) is a Banach space when supplied with the norm ||/||oo = 
sup Jt€X |/(jc)|. Moreover, convergence in B(X) is the same as uniform conveigence. Of 
course, if X is a metric space, we will also be interested in C(X), the space of continuous 
real-valued functions on X, and its cousin Cb(X) = C(X) n B(X), the closed subspace 
of bounded continuous functions in fl(X). Finally, if X is a compact metric space, recall 
that C*(X) = C(X). 

But now we want to add a few more ingredients to the recipe: It’s time we made use 
of the algebraic and lattice structures of B(X). In this chapter we will make formal our 
earlier informal discussions of algebras and lattices. In particular, we will see how this 
additional structure leads to a generalization of the Weierstrass approximation theorem 
in C(X), where X is a compact metric space. 

To begin, an algebra is a vector space A on which there is defined a multiplication 
(/, g) *“*■ f8 (from A x A into A ) satisfying 

(i) (fg)h = figh), for all /, g, h e A; 

(») fig + h) = fg + fh, (/ + g)h = fh + gh, for all f,g,he A; 

(iii) a(fg ) = ( af)g = f{ag), for all scalars a and all /, g e A. 

The algebra is called commutative if 

(iv) fg = gf , for all /, g € A. 

And we say that A has an identity element if there is a vector e e A such that 

(v) fe = ef = /, for all / € A. 

In the case where A is a normed vector space, we also require that the norm satisfy 

(vi) ||/gi < ll/ll ||g|| 

(this simplifies things a bit), and in this case we refer to A as a normed algebra. If a 
normed algebra is complete, we refer to it as a Banach algebra. Finally, a subset B of 
an algebra A is called a subalgebra (of A ) if B is itself an algebra (under the same 
operations), that is, if B is a (vector) subspace of A that is closed under multiplication. 
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Examples 12.1 

(a) R, with the usual addition and multiplication, is a commutative Banach algebra 
with identity. 

(b) If we define multiplication of vectors “coordinatewise,” then R" is a commutative 
Banach algebra with identity (the vector ( 1 , . . . , 1 )) when equipped with the norm 
||jc||oo = maxi< ( <„ |jt, |. We used this observation in Chapter Five. 

(c) The collection A/„(R) of all n x n real matrices, under the usual operations on 
matrices, is a noncommutative algebra with identity. 

(d) Under the usual pointwise multiplication of functions, B(X) is a commutative 
Banach algebra with identity (the constant 1 function). The constant functions 
in B(X) form a subalgebra isomorphic (in every sense of the word) to R. 

(e) If X is a metric space, then C(X) is a commutative algebra with identity (the 
constant 1 function) and C b (X) is a closed subalgebra of fl(X). 

(f) The polynomials form a dense subalgebra of C[ a, b ]. The trig polynomials form 
a dense subalgebra of C 2 * . 

(g) C (,) [ 0, 1 ] and Lip 1 are dense subalgebras of C[ 0, 1 ]. 

(h) C°°(R) is a subalgebra of C(R). 

(i) A function / : [a, b] -*■ R is called a step function if there are finitely many 
points a = to < t\ <■■■< t„ — b such that / is constant on each of the 
open intervals (//, r )+1 ). (And / is allowed to take on any arbitrary real values at 
the tj.) We will write S[a, b] for the collection of all step functions on [a,b]. 
Clearly, S[a, b ] is a subset of B[a,b\ but, in fact, S[ a, b ] is also a subalgebra 
of B[a,b). (Why?) 


EXERCISES 

> I. Let V be a normed vector space. 

(a) Show that scalar multiplication, from R x V into V, is continuous; that is, if 
a„ -*■ a in R, and if x„ -*■ x in V, prove that a„x„ —*■ ax in V. 

(b) Show that vector addition, from V x V into V, is continuous; that is, if x„ — *• x 
and y„ -*■ y in V, prove that x„ + y„ -*■ x + y in V. 

(c) If W is a subspace of V, conclude that W is a subspace of V. 

2. Let A be an algebra, and let B be a subset of A. Prove that B is a subalgebra of 
A if and only if B is a (vector) subspace of A that is also closed under multiplication. 

> 3. Let A be a normed algebra. 

(a) Show that || fg - hk\\ < ll/H ||g - *|| + ||*|| ||/ - h\\ for /, g, h, k € A. 

(b) Show that multiplication, from A x A into A, is continuous; that is, if /„ -*■ / 
and g„ g in A, prove that f„g„ fg in A. 

(c) If B is a subalgebra of A, conclude that B is a subalgebra of A. 

4. Show that the only subalgebras of R 2 , other than {(0, 0)} and R 2 , are the sets 
{(jc, 0) : x € R), {(0, or) : x € R} and {(at, x) : x 6 R}. 

> 5. Prove that S [ a, b] is a subalgebra of B{ a, b\. 
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6. If X is infinite, show that B(X ) is not separable. 

7. Prove that C (l) [a,b] is a Banach algebra when supplied with the norm 
ll/llc<'> = WfWoo + ll/'lloo- (See Exercise 10.18.) 

8. Prove that Lip a is a Banach algebra when supplied with the norm \\f\\upa = 
|| /|| 00 + N a (f). (See Exercise 1 1.25.) 

9. Let A be an algebra with identity e % and let / € A. Given a polynomial p(x) = 
jy k=0 a k x k we (formally) define p (/) € A by p (/) = Yl k =o ***/*• where f° = 
and we call p (/) a polynomial in f. Show that the set of all polynomials in / 
forms a subalgebra of A. In fact, prove that the set of polynomials in / is the smallest 
subalgebra of A containing e and /. For this reason we refer to the set of polynomials in 
/ as the subalgebra generated by e and /. Note that the set of (algebraic) polynomials 
in C[ a, b ], for instance, is the subalgebra of C[a,b\ generated by the functions 
e (*) = 1 and f(x) = x. 


The Weierstrass approximation theorem tells us that the subaigebra of polynomials 
in C[ a, b ] is dense in C[a,b], Using this language, it is now possible to reformulate the 
Weierstrass theorem in more general settings. In particular, our long-term goal in this 
chapter is to prove Stone’s extension of the Weierstrass theorem, which characterizes 
the dense subalgebras of C(X), where X is a compact metric space. 

Our short-term goal will be to characterize S[ a, b ], the closure of the subalgebra of 
step functions S[ a, b ] in the algebra of bounded functions B\ a, b ]. This will give us 
at least one nontrivial, and ultimately useful, example for later reference. Please note 
that it follows from Exercises 3 and 5 that S[ a, b ] is again a subalgebra of B[ a. b ]. To 
begin, let’s check that S[a, b] contains the continuous functions. 

Lemma 12 2. C[a,b\c S[a,/>]. 

proof. Let / e C[ a, b ] and e > 0. We need to find a step function g € SI a, b ] 
such that ||/ - g||oo < £• 

Since / is uniformly continuous, there is a S > 0 such that \f(x) — /(y)| < e 
whenever |x-y | < S. Now take any partition a = t 0 < t\ < •• • < t„ = boi [a, b\ 
for which /,+ 1 — r, <5 for all i, and define g by g(x) = /(/,) for f, < x < t i+ 1, 
and g(b) = /(b) (see Figure 12.1). Then, g e Sla,b ] and |g(.r) - /(x)| < e for 
all x in [a, b ]. □ 
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EXERCISES 

10. Show that S[ a, b ] contains the monotone functions in B[a,b]. [Hint: “Slice 
up” the range of a monotone function to find an approximating step function.] 

11. Let /( x) = sin(l/jt), for 0 < x < 1, and /( 0) = 0. Clearly, / € £[0, 1 ]. 
Show that / £ S[0, 1 ]. [Hint: /(0+) doesn’t exist.] 

12. IsXQn(a.fr) € S[a, 6]? Explain. 


What do Exercise 10 and Lemma 12.2 have in common? Well, recall that monotone 
functions have left- and right-hand limits at each point; that is, both f(x+ ) and fix-) 
exist if / is monotone. This turns out to be precisely what is needed to be in the closure 
of the step functions. 

Theorem 12.3. Let f € B[a,b ]. Then, f € S[a,b] if and only if /(*+) and 
fix—) exist at each x in [ a, b ] ( but only f (a+) and f (b—), of course). 

proof. First suppose that / e S[a, b], and let a < x < b. We will show that 
/ (x4-) exists (the other case is similar). 

Let e > 0 , and choose g e S[a, b] such that ||/ — g||oc < e. Now, since g 
is a step function, g(x4-) exists; in fact, there is a S > 0 such that g is constant 
on the interval (x, x + 8). (Why?) But then, for any x < s,t < x 4- 8, we have 
l/(s)-/(OI < |/(s)-g(.s)|4-|g(s)-g(/)|4-|g(0-/(f)l < 2e, and this is enough 
to imply that fix+) exists. Indeed, if (t„) decreases to x, then this argument shows 
that (/(/„)) is Cauchy (and hence converges). 

Now suppose that / e B[a,b], that f(x+) and fix-) exist for every x in 
[ a, b ], and that e > 0. For each x in [ a, b ] there is a S(x, e) > 0 such that 


x — Six, e) < s, t < x 
or 

x < s, t < x + Six, e) 


\m-m\<e. 


The intervals {(x — 8ix,e),x 4- <S(x,e)) : x e [a,b]} form an open cover for 
[ a, b ]. This means that we actually need only finitely many to do the job. After 
reducing to finitely many such intervals, we list the endpoints and midpoints of 
the intervals in their natural order; call them a = to < t\ <■■■ < t„ = b: 


*? x 2 x$ 

V" " "I 11 " ( ' ") I ( 1 1 — 1~ 

ti tz h U h t(> h h h 


The important thing to notice here is that each interval (/, , r (+I ) is a subinterval 
of some (x — 5(x, £), x) or of some (x, x 4- Six, e)). In either case we have |/(x) — 
/(/)| < e whenever s, t e (r, , r,+i). 



192 


The Stone-Weierstrass Theorem 


Now we are ready to define our step function g. For each i = 0, . . . , n — 1 , 
choose Si € (/, , f,+i ) and set $(*) = /(s f ) for at € (r, , /,» ). Finally, set g(t,) = /(*,) 
for all / = 0 Clearly, g e S[a,b] and ||/ - g Iloo < £■ □ 

We will say that a function possessing finite left- and right-hand limits at each point is 
quasicontinuous. Thus, S[>, fc ] is the algebra of quasicontinuous functions on [a,b], 
A quasicontinuous function has only jump discontinuities. And, since a quasicontinuous 
function is the uniform limit of a sequence of step functions on each compact interval 
in R, it follows from Exercise 1 0. 1 4 (or Theorem 1 0.4) that a quasicontinuous function 
has at most countably many points of discontinuity. 


EXERCISES 

13 . Fill in the missing details from the proof of Theorem 1 2.3. 

14 . If / € B[ a, b ] has only countably many points of discontinuity, does it follow 
that / e S[ a, b ] ? Explain. 


As it happens, the closed subalgebras of B(X) inherit even more structure than one 
might guess. To explain this, it will help if we first formalize the order properties of 

B(X). 

A lattice is a set L, together with a partial order <, in which every pair of elements 
has both a least upper bound and a greatest lower bound (back in L). That is, given 
f,geL, there exist elements / v g (the least upper bound of / and g) and / a g (the 
greatest lower bound of / and g) in L satisfying: 

(i) If f <h and g <h, for some h € L, then / v g < h. 

(ii) If h < f and h < g, for some h e L, then h < f a g. 

As you might expect, a sublattice is a subset of a lattice that is a lattice in its own 
right (under the same ordering). 

A vector space that is also a lattice (under some given partial order) is called a vector 
lattice. In a vector lattice we may decompose each element into its positive and negative 
parts: f = f + - f~, where 

/ + = / v 0 and /" = -(/ a 0). 

We may also define the absolute value of an element of a vector lattice using the formula 
|/| = / + + f~ . See Figure 12.2. 
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The notions of a normed vector lattice and a Banach lattice should be clear if you 
have read this far. In a normed lattice, we also require that the norm satisfy ||/|| < ||g|| 
whenever |/| < |g|. (As in the case of normed algebras, this fact is used to show that 
the lattice operations are continuous.) 

Examples 12.4 

(a) Given any set X, ordinary set inclusion is a partial order on V(X), the power set 
of X ; that is, we define A < B if and only if A c B. It is easy to see that V(X) is 
also a lattice under this ordering, and that A v B = A U B and A a B = A n B. 
For this reason. Aw B is sometimes read as “ A join B,” and A A B is sometimes 
read as “A meet B." 

(b) R", under “coordinatewise” ordering of vectors (i.e., x < y if and only if x, < y. 
for all i), is a Banach lattice when equipped with the norm ||jc Hoc = maxi<,<„ |jc, |. 

(c) B(X) is a Banach lattice under the usual pointwise ordering of functions: / < g 
if and only if f(x) < g(x) for all x. In this case, (/ v g)(x) = ma x{f(x), g(x)} 
and (/ A gXx) = min{/(jr), g(x)}. Notice, too, that |/|(x) = |/(x)|. 


EXERCISES 

15. Let L be a lattice, and let 5 be a subset of L. Show that 5 is a sublattice of L if 
and only if / v g and / A g are in S whenever f,geS. 

16. In a vector lattice L, show that — (/ A g) = (— /) v (— g), and conclude that 
/" = (-/)v0 = (-/)+. 

> 17. If /, g € B(X), prove that 

(a) f + g = fv g + f Ag and 1/ - gl = / V g - f A g. 

(b) 2(/ v g) = / + g + |/ — g\ and 2(/ A g) = / + g - \f - g|. 

(c) / + A /- = 0 and |/| = / v (-/) = /+ v f~. 

(d) 1/ V g| < |/| v |g| < maxfU/lloo, ||g||oo} • 1. where 1 stands for the constant 
1 function. 

[Hint: These are all just statements about real numbers.] 

> 18. Let A be a vector subspace of B(X). Show that A is a sublattice of B(X) if and 
only if |/| e A whenever / € A. If X is a compact metric space, this gives an easy 
proof that C(X) is a sublattice of B(X). 

19. If /, g € B(X), show that ||/ v g\\ x < maxM/IU, Mcol 


It follows from Exercise 18, for example, that S[ a, b ] is a sublattice of B[a,b]. It 
would be nice to know whether the same holds for S[a, b]. Our next result explains 
the claim, made earlier in this section, that the closed subalgebras of BIX) inherit even 
more structure than one might guess. 

Theorem 12.5. Let A be a subalgebra of B(X). Then, A is both a subalgebra 
and a sublattice of BIX). 
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proof. It follows from Exercise 3 that A is a subalgebra of B(X). In particular, 

A is a subspace of B(X). Thus, by Exercise 18, we need only show that |/| e A 
whenever f e A. 

Given / € A and e > 0, we will show that there is an element g € A with 
II I/I - glloo < e and, hence, that |/| e A = A. 

Let M = H/Hoo. and consider the function |r| on the interval [ -M , A/]. By the 
Weierstrass approximation theorem (or by Exercise 11.11) there is a polynomial 
pit) = o a * tk suc h that ||/| - pit ) | < e for all t in [-M, M], In particular, 
notice that |a<)| = |p(0)| < e. 

Now, since |/(*)| < M for all x € X, it follows that ||/(jr)| - pifix))\ < e for 

all x e X. But pi fix)) = ao + a\fix) + 1- a„f n (x) = a 0 + g(x), where the 

function g = a\f-\ \-a„f n € A, because A is an algebra. Thus, ||/(.x)|-g(x)| < 

|l/U)|-p(/(A:))| + |p(/(A:))-gU)| < £+|aol < 2£forallx € X. In other words, 
for each s > Owe can supply an element g e A such that || |/| - glloo < 2e. Thus, 
\f\eA. □ 

Please note that the proof of Theorem 12.5 could be streamlined if we had also 
assumed, as some authors do, that A contains the constant functions. The import of this 
and other similar hypotheses will be made clear in the next section. 

Corollary 12.6. Let X be a compact metric space, and let A be a subalgebra of 
C(X). Then, A is both a subalgebra and a sublattice of C(X). 

Note that, from Exercise 1 1.1 1, the proof of Theorem 12.5 can be written without 
reference to the classical Weierstrass theorem. In particular, Corollary 12.6 can be 
proved without reference to Theorem 1 1 .3. 


EXERCISES 

> 20. Prove Corollary 12.6. 

21. Show that the set of all even functions in C [— 1 , 1 ] is a proper closed subalgebra 
of C[— 1, 1]. 

22. Let X be a compact metric space, and a let xq € X. Show that the set A = 
[f e C(X) : f (jto) = 0} is a proper closed subalgebra of C(X). 


The Stone-Weierstrass Theorem 

Using our new terminology, we may restate the classical Weierstrass theorem to 
read: If a subalgebra A of C[a,b ] contains the functions e(x) = 1 and f(x) = x, 
then A is dense in C[a, b ]. Any subalgebra of C[a,b ] containing 1 and x actu- 
ally contains all of the polynomials; thus our restatement of Weierstrass’s theorem 
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amounts to the observation that any subalgebra containing a dense set is itself dense in 
C[a,b], 

Our goal in this section is to prove the analogue of this new version of the Weierstrass 
theorem for subalgebras of C(X) where X is a compact metric space. In particular, we 
will want to extract the essence of the functions 1 and x from this statement. That is, 
we seek conditions on a subalgebra A of C(X) that will force A to be dense in C(X). 
The key role played by 1 and x, in the case of C[a, b ], is that a subalgebra containing 
these two functions must actually contain a much larger set of functions. But since we 
cannot be assured of anything remotely like polynomials living in the more general 
C(X) spaces, we might want to change our point of view. What we really need is some 
requirement on a subalgebra A of C{X) that will allow us to construct a wide variety 
of functions in A. And, if A contains a sufficiently rich variety of functions, it might 
just be possible to show that A is dense. 

Since the two replacement conditions we have in mind have nothing to with the 
algebraic structure of C(X), we state them in some generality. 

Let A be a collection of real- valued functions on some set X. We say that A separates 
points in X if, given x ^ y e X, there is some / e A such that f(x) ^ /(y). We say 
that A vanishes at no point of X if, given x € X, there is some / e A such that 
fix) * 0. 

Examples 12.7 

(a) The single function f(x) = x clearly separates points in [ a, b ], and the function 
e(x) = 1 obviously vanishes at no point in [a, b ]. Any subalgebra A of Cf a, b ] 
containing these two functions will likewise separate points and vanish at no 
point in [a, b\. 

(b) For any metric space X, the collection C(X) separates points in X and vanishes 
at no point of X. Why? 

(c) The set £ of even functions in C[— 1, 1] fails to separate points inf— 1, l];indeed, 
f(x) = /(- x) for any even function. However, since the constant functions are 
even, £ vanishes at no point of [- 1 , 1). From Exercise 21, £ is a proper closed 
subalgebra of C[— 1, 1]. The set of odd functions will separate points (since 
f(x) = x is odd), but the odd functions all vanish at 0. The set of odd functions 
is a proper closed subspace of C[— 1 , 1 J, although not a subalgebra. 

(d) The set of all functions / € Cf-1, 1] for which /( 0) = 0 is a proper closed 
subalgebra of C(- 1 , 1 ]. In fact, this set is a maximal (in the sense of containment) 
proper closed subalgebra of C[— 1, If. We will see why shortly. Note, however, 
that this set of functions does separate points in [ - 1 , 1 ] (again, because it contains 
/(*) = *). 


As these few examples illustrate, neither of our new conditions, taken separately, 
is sufficient to force a subalgebra of C(X) to be dense. But, as we will see, both 
conditions together will do the job. To better appreciate the utility of these new con- 
ditions, let’s isolate the key computational tool they permit within an algebra of func- 
tions. 
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Lemma 12.8. Let Abe an algebra of real-valued Junctions on some set X, and 
suppose that A separates points in X and vanishes at no point of X. Then, given 
x ^ y e X and a, b € R, we can find an f e A with f(x) = a and f(y ) = b. 

proof. Since A separates points in X and vanishes at no point of X, we can 
find g, h, k e A such that g(jt) ^ g(y), h(x) # 0, and k(y) 0. Thus, both 
u = gh - g(y)h and v = gk - g(x)k are in A, since A is an algebra. Moreover, u 
and v satisfy u(y) = 0 = v(x) and u(x) ± 0 v(y). Finally, the function 

a b 

f = — “ + -rr ” 
u(x) v(y) 

is in A and satisfies /( x) = a, f(y) = b. □ 

Note that we were forced to be somewhat fussy in the proof of Lemma 12.8; it 
would not have been appropriate to write m = (g - g(y)]h, for example, since A need 
not contain the constant function g(y) = g(y) • 1 and so need not contain the factor 
g - g(y). To avoid just this sort of nuisance, some authors require that A contain the 
constant functions in place of the (weaker) condition that A vanish at no point of X. 

A second, slick proof of Lemma 12.8 is based on the observation that, for any pair 
of distinct points x # y e X, the set A = {(g(jt), g(y)) : g e A} is a subalgebra of R 2 . 
(It is easy to list all of the subalgebras of R 2 ; see Exercise 4.) If A separates points in 
X, then A is apparently neither {(0, 0)} nor {(*, x) : x € R}. If A vanishes at no point, 
then both {(jc, 0) : x € R} and {(0, x) : x € R} are excluded. Thus A = R 2 , which is 
essentially the conclusion of Lemma 12.8. 

Finally, we are ready for Stone’s version of the Weierstrass theorem. It should be 
pointed out that the theorem, as stated, does not hold for algebras of complex-valued 
functions over C. More on this later. 

The Stone-Weierstrass Theorem, real scalars 12.9. Let X be a compact metric 
space, and let A be a subalgebra ofC(X). If A separates points in X and vanishes 
at no point of X, then A is dense in C(X). 

proof. First notice that we may assume that A is closed (and prove that A = 
C(X)). Indeed, if A satisfies the hypotheses of the theorem, then so does A. (Why?) 
And if we are allowed to assume that A is closed, then, according to Corollary 
12.6, we may also assume that A is a sublattice of C(X). We would be foolish to 
do otherwise: Henceforth, A is a closed subalgebra and a sublattice of C(X). We 
will break the remainder of the proof into two steps. 

Step 1. Given / e C(X), x e X, and e > 0, there is an element g x € A with 
gx(x) = f{x) and g,(y) > f(y) - e for all y € X. 

From our “computational” lemma. Lemma 12.8, we know that for each y € X, 
y ?£ x, we can find an h y € A so that h y (x) = f(x) and h y (y) = f(y), as in Fig- 
ure 12.3. 
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— (— •— ) <— »— h - » — )— 

X 1/2 

Next, since — / is continuous and vanishes at both x and y, the set U y = 
{r € X : A y (r) > /(/) — e} is open and contains both x and y. Thus, the sets 
(U y )yfx form an open cover for X. Since X is compact, finitely many U y suffice, 

say X = U y , U • • • U U y , . Now set g x = max{/» y h y J. Because A is a lattice, 

we have g x e A. Note that g x (x) = f(x) since each /i v , agrees with / at x. 
And g x > f - e since, given y ^ x, we have y e U yi for some »', and hence 
h yi (y) > f(.y ) - £■ 

Step 2. Given / € C(X) and e > 0, there is an h e A with \\f - A||oo < e. 

From Step 1, for each x e X we can find g x € A such that g x (x) = f(x) and 
^x(y) > f(y) - £ for all y e X, as in Figure 12.4. Now we reverse the process 



x z 12.4 

used in Step 1: For each x, the set V x = [y e X : g x (y ) < f(y) + e) is open 
and contains x. Again, since X is compact, X = V Xt U • • • U V Xm . This time, set 

h = min{g,, g*.} € A. As before, h(y) > f(y) - e for all y since each g Xl (y) 

does so, and h(y) < f(y) + e for all y since at least one g x ,(y) does so. □ 

If we are careful to avoid reference to the classical Weierstrass theorem in the proof 
of the Stone-Weierstrass theorem (see the remarks following Corollary 12.6), then 
Theorem 1 1 .3 may be considered a corollary to Theorem 1 2.9 (recall Example 1 2.7 (a)). 

Corollary 12.10. Given f e C[a, b] and e > 0, there is a polynomial p such 
that ||/ - plloo < £■ 
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EXERCISES 

23. If X and Y are compact, show that the subspace of C(X x Y ) spanned by 
the functions of the form f(x , y) = g(x)h(y), g e C(X), h G C(F), is dense in 
C(X x Y). 

24. Let K be a compact subset of M\ Show that the set of all polynomials (in 
n-variables) is dense in C(K). 

25. Let X be a compact metric space containing at least two points, and let A be a 
proper closed subalgebra of C(X). If A separates points in X, show that there is an 
xq G X such that A = {/ G C(X) : f(x 0 ) = 0}. 


We used the classical Weierstrass theorem to prove that C[a,b] is separable. Like- 
wise, the Stone-Weierstrass theorem can be used to show that C(X) is separable where X 
is a compact metric space. While we do not have anything quite so convenient as polyno- 
mials at our disposal, we do, at least, have a familiar collection of functions to work with. 

Given a metric space (X, d ) and 0 < K < oo, we will write Lip^X) to denote the 
collection of all real- valued Lipschitz functions on X, with constant at most K \ that is, 
/ : X -> R is in Lip^(X) if |/(jc) - f(y)\ < Kd(x, y ) for all Jt, y G X. And we will 
write Lip(X) to denote the set of functions that are in Lip^(X) for some K\ in other 
words, Lip (X) = U£U Lip*(X). It is easy to see that Lip (X) is a subspace of C(X); in 
fact, if X is compact, then Lip (X) is even a subalgebra of C(X). 


EXERCISES 

> 26. If X is compact, show that Lip (X) is a subalgebra of C(X). 

27. If / G Lip^ [ a , b ], show that / can be uniformly approximated by polynomials 
in Lip^[a, b]. 


Clearly, Lip(X) contains the constant functions and so vanishes at no point of X. 
To see that Lip(X) separates point in X, we use the fact that the metric d is Lipschitz: 
Given x 0 ^ yo € X, the function f(x) — d(x , yo) satisfies f(x 0 ) > 0 — /(yo). Moreover, 
/ g Lip(X) since 


l/(*) - f(y)\ = \d(x, yo) - d(y , y 0 )| < d(x , y). 

Thus, if X is compact, then Lip (X) is dense in C(X). 

Now, to see that C(X) is separable for X compact, it suffices to show that Lip (X) is 
separable. To see this, first notice that Lip (X) = U£Li where 

Ek = [f £ C(X) : ll/lloo < K and / € Up K (X)). 


(Why?) The sets E K are (uniformly) bounded and equicontinuous. Hence, by the 
Arzela-Ascoli theorem, each E K is compact in C(X). Since compact sets are sepa- 
rable, as are countable unions of compact sets, it follows that Lip (X) is separable. 
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Corollary 12.11. If X is a compact metric space, then C(X) is separable. 

In many texts, the Stone-Weierstrass theorem is used to show that the trig polyno- 
mials are dense in C 2 * . One approach here might be to identify C 2 ” with the closed 
subalgebra of C[0, 2tt] consisting of those functions / that satisfy /( 0) = f(2n). 
Probably easier, though, is to identify C 2n with the continuous functions on the unit 
circle T in the complex plane, 

T = [e w : 0 e R} = [z € C : |z| = 1). 

by using the identification 

/6C 2 * «— ► g e C(T), where g(e") = f(t). 

Under this correspondence, the trig polynomials in C 27r match up with (certain) poly- 
nomials in z = e [t and z = e~ h . But, as we saw in Chapter Eleven, even if we start 
with real-valued trig polynomials, we will end up with polynomials in z and z having 
complex coefficients. 


EXERCISE 

28. The polynomials in z obviously separate points in T and vanish at no point 
of T. Nevertheless, the polynomials in z (with complex coefficients) are not dense in 
the space of continuous complex-valued functions on T. To see this, here is a proof 
that f(z) = z cannot be uniformly appro ximate d by polynomials in z : 

(a) If p (z) = £*= o show 11131 h /(«") P ( e")dt = 0. 

(b) Show that lit = f* /VO f(e")dt = f* JW) [fie 1 ') - p (e")] dt. 

(c) Conclude that 11/ — p ||<» > 1 for any polynomial p. [Hint: Take absolute values 
in (b) and note that |/| = 1.] 


Given the result in Exercise 28, it might make more sense to consider the complex- 
valued continuous functions on T. We will write Cc(T) to denote the complex-valued 
continuous functions on T and Cr(T) to denote the real-valued continuous functions on 
T. Similarly, is the space of complex-valued, 2tt -periodic functions on R while 

stands for the real-valued, 2n -periodic functions on R. Now, under the identification 
that we made earlier, we have Cc(T) = C}? and Cr(T) = C*. The complex-valued 
trig polynomials in now match up with the full set of polynomials, with complex 
coefficients, in z = e' 1 and z = e~’'. We will use the Stone-Weierstrass theorem to 
show that these polynomials are dense in Cc(T). 

We might as well do this in some generality: Given a compact metric space X, we will 
write CcW for the set of all continuous, complex-valued functions / : X -*■ C, and 
we norm CciX) by H/Hoo = sup J€X |/(j:)| (where |/(x)| is the modulus of the complex 
number fix), of course). Cc(X) is a Banach algebra over C. To make it clear which 
field of scalars are involved, we will write Cr(A’) for the real-valued members of CciX). 
Notice, though, that Cr(X) is nothing other than our old friend C(X) with a new name. 
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More generally, we will write Ac to denote an algebra, over C, of complex-valued 
functions and Ar to denote the real-valued members of Ac- It is not hard to see that Ar 
is then an algebra, over R, of real-valued functions. 

Now, if / is in CciX), then so is the function /(*) = /(*) (the complex conjugate 
of / (jt) ). This puts 

Re/ = ^(/ + /) and Im / = !(/_/), 

the real and imaginary parts of /, in Cr(X). Conversely, if g, h € Cr(X), then g + ih e 
Cc(X). 

This simple observation gives us a hint as to how we might apply the Stone- 
Weierstrass theorem to subalgebras of Cc(X). Given a subalgebra Ac of CcW, suppose 
that we could prove that Ar is dense in Cr(X). Then, given any / € CciX), we could 
approximate Re / and Im/ by elements g, h e Ar. But since Ar c Ac, this means 
that g + ih e Ac and g + ih approximates /. That is, Ac is dense in Cc(X). Great! 
And what did we really use here? Well, we need Ar to contain the real and imaginary 
parts of “most” functions in Cc(X). If we insist that Ac separates points and vanishes 
at no point, then Ar will contain “most” of Cr(X). And to be sure that we get both 
the real and imaginary parts of each element of Ac, we will insist that Ac contain the 
conjugates of each of its members: / e A c whenever f € Ac- That is, we will require 
that Ac be self-coqjugate (or, as some authors say, self-adjoint). 

The Stone- Weierstrass Theorem, complex scalars 12.12. Let X be a compact 
metric space, and let Ac be a subalgebra, over C, of CciX). If Ac separates points 
in X, vanishes at no point of X, and is self-conjugate, then Ac is dense in Cc(X). 

proof. Again, write A R for the set of real-valued members of Ac- Since Ac is 
self-conjugate, Ar contains the real and imaginary parts of every / € Ac: 

Re/ = ^(/ + /) € A r and Imf = ^(f - f) e Ar. 

Moreover, Ar is a subalgebra, over R, of Cr(X). In addition, Ar separates points 
in X and vanishes at no point of X. Indeed, given x # y € X and f e Ac with 
f{x) ^ f iy ), we must have at least one of Re / ( jc) ^ Re/ (y) or lm/(jt) ^ Im/ (y). 
Similarly, fix) ^ 0 means that at least one of Re/(*) ^ 0 or Im/(jc) / 0 
holds. That is, Ar satisfies the hypotheses of the real-scalar version of the Stone- 
Weierstrass theorem. Theorem 1 2.9. Consequently, Ar is dense in Cr(X). 

Now, given / € Cc(X) and e > 0, take g, h € Ar with ||g - Re /||oo < e/2 and 
||/t - Im/Hoo < e/2. Then, g + ih e Ac and ||/ - ig + //Olloo < e. Thus, Ac is 
dense in CciX). □ 

Corollary 12.13. The polynomials, with complex coefficients, in z and z are dense 
in Cc(T). 

Note that it follows from the proof of Theorem 12.1 1 that the real parts of the 
polynomials c*e ,fc \ that is, the real trig polynomials, are dense in Cr(T) = Cjf. 
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Again, if we are careful to avoid using the classical Weierstrass theorem to prove the 
Stone-Weierstrass theorem (using Exercise 1 1 . 1 1 in place of Theorem 1 1 .3 in the proof 
of Theorem 12.S), then we may consider Weierstrass’s second theorem as a corollary 
to the complex-scalar version of the Stone-Weierstrass theorem. 

Corollary 12.14. Given f e C 2 * and e > 0, there is a trig polynomial T such 
that ||/ - THoo < £. 


Notes and Remarks 

The foundations for the “algebraic” approach to the study of C{X) are in Marshall 
Stone’s landmark paper. Stone [1937]. It is here that Stone gives his version of the 
Weierstrass theorem. Theorem 12.9, but it is not easy to find among the dozens of 
important results in this mammoth, 1 06-page work! The premise that l ‘C(X) determines 
X” is taken to its logical conclusion. Specifically, Stone considered such questions as: 
If C(X) and C(Y) are isomorphic (as rings, or as Banach spaces, for example), does 
it follow that X and Y are homeomorphic? Which topological properties of X can be 
attributed to the structural properties of C{X) (and conversely)? Paraphrasing a passage 
from his introduction: “We obtain a reasonably complete algebraic insight into the 
structure of C*(X) and its correlation with the structure of the underlying topological 
space.” Stone later gave a less formal (but still formidable) summary in Stone [1962]. 
For an informal summary of related results, see Shields [1987a, 1989]. 

It would probably be fair to say that the study of lattices and their application to 
analysis (and much more) began with Riesz’s address at the 1 928 International Congress 
of Mathematics, Riesz [1930] (see also Riesz [1940]), and began in earnest with the 
appearance of G. BirkhofFs book Lattice Theory in 1940 (and later editions in 1948 
and 1967; see Birkhoff [1940]). For a very brief introduction to the topic, see Birkhoff 
[1943] and Schaefer [1980]. 

For more details on algebras, lattices, and rings, as used in analysis and topology, 
see Simmons [1963], Goffman and Pedrick [1965], Jameson [1974], and the classic 
Gillman and Jerison [I960]. 

The proofs of Lemma 12.8 and Theorem 12.9 are largely based on the presentation 
in Rudin [1953], but see also Douglas [1965] , and Folland [1984] . The “slick” proof 
of Lemma 12.8 is taken from Folland [1984]. The material on Lip(X) and the Stone- 
Weierstrass theorem is based on the presentation in Dudley [1989]. 
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Functions of Bounded Variation 

Throughout this book we’ve encountered the theme that C(X) determines X. Said 
another way, to fully understand X we want to understand C(X) as well. Taking this 
one step further, though, raises a curious question: How are we to understand C(X) 
without knowing something about C(C(X))? If we want to be true to our principles, 
we will have to consider continuous real-valued functions on C(X). If that sounds too 
esoteric to bother with, fear not. As it happens, we need only to consider the continuous 
linear real-valued functions on C(X), and such functions have a simple and altogether 
user-friendly description: Definite integrals! But we’re getting a little ahead of ourselves. 
We’ll talk about integrals in the next chapter. For the present, we’ll content ourselves 
with the study of a class of functions that turns out to be of paramount interest in this 
postponed discussion of integration. 

To motivate the inevitable blur of definitions ahead of us, let’s consider a simple 
example. Suppose that f(f) = (x(t), y(t)), for a < t < b, is a “nice” curve. What would 
we mean by the length of this curve? 



Well, we might consider a polygonal approximation to f, with nodes at a = to < 
t\ < • ■ ■ < t„ = b (as in Figure 13.1), find the length of the approximating polygon: 
|| f(r, ) — f(r, _ i ) || 2 , and then define the length of the curve as the limit, or supremum, 

of these approximate lengths as the partition (?o, t t t„} gets “bigger.” And why not? 

If x(t) and y(t) are “reasonable” functions, this definition will work just fine. Keep this 
idea in mind as we proceed. 

Given / : [ a, b ] -*• R and a partition P = {a = to < t\ < ■ • ■ < t„ = h] of [ a, b ], 
we define the variation of / over P by 

n 

V(/,F) = £|/(/ / )-/(f,-,)|. 

i=i 
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Notice that this “one-dimensional” variation accounts only for the vertical changes in 
the graph of / between points in the partition P. 

If Q is another partition of [ a, b ] with g d P, we say that Q refines P , or that Q is 
a refinement of P. In this case we have V(f Q) > V(f, P). To see why, first suppose 
that Q = P U {x}, where f* < x < Then, 

V(f,P) = J2 l/( r <) - /(*i-l)l + l/(**+l) - f(tk)\ 

i?k + 1 

< Y i/^>) - /(^-i)i + i/w - + \f(tk+o - /wi 

i &+ 1 

= V(/, 0). 

The general case now follows by induction on the number of elements of Q \ P. In 
particular, since every partition contains the trivial partition { a , b}> we get V(f Q) > 
V(/, R) > l/(t) - f(a) |, whenever Q D P- 

We define the total variation of / over [ a, b 1 by 

V a h f - sup V(f, P). 

P 

If V% f < oo, we say that / is of bounded variation on [ a, b ]. In other words, / is of 
bounded variation on [a,b] if the variations V(f , P) are bounded above, independent 
of the partition P. This notation may remind you of the definition of the Riemann 
integral, and that is not entirely coincidental. As we will see, Vj 7 / behaves very much 
like an integral. 

Examples 13.1 

(a) If / : [a, b ] R is monotone, then V(/, P) = \f(b) - f(a)\ for any partition 
P of [a, b]. (Why?) Thus, / is of bounded variation and V%f = \f(b)- f(a)\. 

(b) More generally, any piecewise monotone function / : [a,b] -* Mis of bounded 
variation. This means that polygonal functions and polynomials, for example, 
are of bounded variation (over a bounded interval). 

(c) If / : [a, b ] -» M satisfies |/(x) — f(y ) | < K\x — y\ for all x y y € [ a , b], then 
/ is of bounded variation and Vj 7 / < K(b — «). (Why?) 

(d) Every step function is of bounded variation. If / is a step function that is constant 
on each of the intervals (t £ , t i+ \ ), where {/ 0 , . . . , t n } is a partition of [ a, b ], then 
Vj 7 / is the sum of all of the left- and right-hand “jumps” in the graph of /, that 
is, the sum of |/(/,) - /(/,-+)| and \f(U) - /(^— )| (where appropriate). 

(e) We define the length of the curve f(r) = (x(r), y(0)> a < t < b y as the supre- 
mum of the (two-dimensional) variations £" =1 ||f(t, ) - f(r,_i )|| 2 . Thus, the curve 
has finite length (or is rectifiable) if and only if both x and y are of bounded 
(one-dimensional) variation on [a,b]. This follows from the observation that 
max{|x(0 - x(s ) |, \y(t) - y(s)|} < ||f(0 - f(^)|| 2 < |x(0 - x(s)\ + |y(0 ~ y(s ) |. 

We will write BV[a, b] for the collection of all functions of bounded variation on 
[a,b]. You won’t be surprised to learn that BV[a,b] is both a Banach space and a 
Banach algebra, but you may find it curious that we have more than a little work ahead 
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of us to establish these facts. In fact, it is probably not at all clear at this point that 
BV[a,b] C B[a,b]. 

Lemma 13 , 2 . Iff : [ a, b ] R is of bounded variation , then f is also bounded 
and satisfies ||/||oo < l/(a)l + V b f. 

proof. Let a < x < b 9 and set P = { a , x y b}. Then, | f{x) — f(a)\ < V(f P) < 
Vaf- Consequently, |/(x)| < \f(a)\ + V b a f. □ 

But even bounded continuous functions need not be of bounded variation. Here’s an 
example: Define f(x) = x sin (1 /jc) for 0 < x < 1 and /( 0) = 0. Then, / e C[0, 1 ] c 
B[ 0, 1 ], but / £ BV[ 0, 1 ]. To see this, fix n, and let P be any partition of [0, 1 ] 
containing the points = 2/[(2 k + l)7r], for k = 0, . . . , n. Notice that f(t k ) = (— 1)*^, 

and so 

4 1 

l/(**+i) - f(tk ) I = fc+i + tk > 2tk+\ > — • — — . 

3jt k 4- 1 

Consequently, 

4 ^ 1 

V(f, P) > — > > oo as n -+ oo. 

’-Snfak+l 

Now the point to these examples is that BV[a, b] contains several subsets that we 
know to be dense in C[a, b] (under the sup norm). Thus, C[a, b] is contained in the 
closure of BV[a,b] under uniform convergence but not in BV[a y b] itself. That is, 
B V [ a , b ] is evidently not closed under uniform convergence (and hence is not complete 
under uniform convergence). So, we might want to consider some norm other than the 
sup-norm on BV[a,b]. As it happens, the total variation V%f is “almost” a norm. 

Lemma 13 . 3 . Let f,geBV[a,b], and let cel. Then: 

(i) Vj 7 / = 0 if and only if f is constant. 

(ii) V a b (cf) = \c\ V a b f. 

(Hi) V a b (f + g)< V a b f + V a b g. 

(iv) V b (fg) < H/tloo V a b 8 + llslloo V b f. 

(v) V b \f\ < V b f. 

(vi) V b f = V a c f + V b f, fora < c < b. 

proof. We will prove (iii) and (vi) and leave the rest as exercises. To begin, let 
P be a partition of [a, b ]. By the triangle inequality, V(f + g, P) < V(f, P) + 
V(g, P). Hence, V(f + g,P)< V b f + V b g, and (iii) follows. 

Next, given any partition Q of [ a, c ] and any partition R of [ c, b ], then P = 

Q U R is a partition of [a,b ]. Moreover, V(f Q ) 4- V(f R) = V(f, P) < V a b f. 
Since Q and R were arbitrary, it follows that Vf f + V b f < V b f. Conversely, if 
we are given a partition P of [a, b], then Q = (P U {c}) n [a, c] is a partition 
of [a, c] and R — (P U {c}) n [c, fc] is a partition of [c, b]. Thus, V(/, P) < 
Vif, P u {c}) = V{f Q) + V(f, R ) < V£f + v b f. Hence, V b f < V'f + V c b f, 
which proves (vi). □ 
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EXERCISES 

1. Show that V a b (Xq) = -foe on any interval [ a, b ]. 

2. Show that 5[ a, b ] C BV[a, b], where S[ a, b ] is the collection of step func- 
tions on [a, b] (Example 12.1 (i». 

>3. If / has a bounded derivative on [ a, b ], show that V b f < ||/ '||oo(^ — a)- 
4. If / e B V[ a, b ] and [ c, d ] C [ a, b ], show that / € BV[c,d] and V* f < 

v a b f. 

> 5. Complete the proof of Lemma 13.3. 

6. We can test several of the inclusions implicit in our discussion up to this point by 
means of a single family of functions. For a € Rand /I > 0,set/(jt) = x a sin(jc“^), 
for 0 < jc < 1, and /( 0) = 0. Show that: 

(a) / is bounded if and only if a > 0. 

(b) / is continuous if and only if a > 0. 

(c) / '(0) exists if and only if a > 1 . 

(d) / ' is bounded if and only if a > 1 + fi. 

(e) If a > 0, then / € B V[ 0, 1 ] for 0 < < a and / £ B V[ 0, 1 ] for fi > a. 

[Hint: Try a few easy cases first, say a = fi = 2.] 

7. Suppose that f e B[a y b]. If V* +€ f < M for all s > 0, does it follow that / 
is of bounded variation on [ a, b ]? Is Vff < Ml If not, what additional hypotheses 
on / would make this so? 

8. If / is a polygonal function on [ a, b ], or if / is a polynomial, show that V* f = 
fa 1/ '(01 dt. (This at least partly justifies our earlier claim that V* / behaves like an 
integral.) [Hint: In either case, / is piecewise monotone and piecewise differentiable. 
Thus we have ff \f'(t)\dt = ±(f(d) — /(c)) over certain “pieces” [c, d] of 
[a,b].] 

9. If / has a continuous derivative on [ a, b ], and if P is any partition of [ a, b ], 
show that V(/, P) < f a b |/ '(01 dt. Hence, V a b f < f b \f '(01 dt. 

10. Suppose that /„ — ► / pointwise on [ a, b ]. If each f„ is increasing, show that 
/ is increasing. If each f„ is of bounded variation, does it follow that / is of bounded 
variation? Explain. 

> 11. If/„— ► / pointwise on [a, b] % show that V(/„, P) V(/, P) for any parti- 
tion P of [ a, b ]. In particular, if we also have V*f H < K for all n, then V*f < K 
too. 

12. Here is a variation on Exercise 1 1 : If ( f n ) is a sequence in B V[ a, b ], and if 
f„->f pointwise on [a, b ], show that V a b f < lim inf^^oc V b f„. 


Statements (ii) and (iii) of Lemma 1 3.3 tell us that B V[ a, b ] is a vector space, while 
(iv) at least tells us that BV[a,b] is closed under products (we will improve on this 
inequality later). Notice, too, that from (v) and Exercise 12. 18 it follows that BV[a.b] 
is a sublattice of B[a, b]. However, it is not true that V b f < V b g whenever |/| < |g|. 
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For example, if /(1/2) = 1 and f(x) = 0 for x ^ 1/2, and if g(x) = 1 for all x, then 
I /I 5 |g|, but V 0 ' / = 2 while V 0 ‘g = 0. In any case, it is clear that V* / defines a 
seminorm on BV(a, b ] (since V*(f - g) = 0 only says that f — g is constant). We 
won’t need to make much of an adjustment to arrive at a norm. In fact, it is easy to 
check that 


ll/llav = |/(a)| + V a b f 

defines a norm on BV[a , b ]. From Lemma 13.2 we have ||/||oc 5 ll/llav. and hence 
convergence in BV[a, b] implies uniform convergence. 

Theorem 13.4. B V[a, b ] is complete under ||/||bv = |/(a)| + V* f. 

proof. Let (/„) be a Cauchy sequence in BV[a,b\. Then, in particular, (/„) is 
also Cauchy in B[a , b ]. Thus, (/„) converges uniformly (and pointwise) to some 
/ e B[a,b\. We need to show that / € BV[a,b ] and that 11/ - /„||bv -*■ 0. 
We’ll do both at once. 

Let P be any partition of [ a, b ], and let e > 0. Now choose N such that 
ll/m - /.llav < £ whenever m,n > N . Then, from Exercise 11, for any n > N 
we have 

I /(«) - Ua ) I + V(f - /„, P) = lim [ \f m (a) - f n (a)\ + V{f m - /„, P)] 

m-+oc L J 

< sup II f m - /nllsv < £. 

m>N 

Since this estimate holds for all P y we have \\f - /„|| B v < £ for any n > N. But 
if / - /„ € BV[a y b] and f n e BV[a y b] y then / g BV[a y b] too. Of course, 
our first estimate shows that 11/ — fnWsv 0. □ 


EXERCISES 

13. Given a sequence of scalars (c„ ) and a sequence of distinct points (x „ ) in (a , b) y 
define f(x) = c„ if x = x„ for some n y and f(x) = 0 otherwise. Under what 
condition(s) is f of bounded variation on [ a y b ]? 

14. Let I(x) = 0 if x <0 and /(*) = 1 if x > 0. Given a sequence of scalars (c„) 
with |c„| < oo and a sequence of distinct points (jc„) in ( a , b ], define f(x) = 
53^=1 c„l(x — x„){otx € [a,b]. Show that / € BV[a, ftjandthat V*/ = 

k.l- 


For the moment, let’s put aside the “abstract” structure of BV[a, b] and instead 
focus on a concrete, or intrinsic, characterization of the functions of bounded variation. 
This characterization will depend heavily on a knowledge of the function Vf f. Again, 
this should remind you of the Riemann integral (and the Fundamental Theorem of 
Calculus). 
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Theorem 13.5. Fix f e BV[a,b] and set v(x) = V* f, for a < x < b, and 
v(a) = 0. Then, both v and v — f are increasing. Consequently, f = v - (v — f) 
is the difference of two increasing functions. 

proof. Although it is clear that v is increasing, the proof is still enlightening, 
especially if we are willing to go the extra mile. 

Given x < y in [a, b ], it follows from Lemma 13.3 (vi) that 

= V-f-Vff = V>f > \f(y)-f(x)\ > 0. (13.1) 

Hence, v is increasing. But, in fact, v(y) -v(x) > f(y)- f (x), too. That is, v - f 
is also increasing. □ 

On the other hand, since monotone functions are of bounded variation, we get. 

Corollary 13.6. (Jordan’s Theorem) A function f : [a,b] -*■ R is of bounded 
variation if and only if f can be written as the difference of two increasing 
functions. 

Corollary 13.7. Each f e BV[a,b] is quasicontinuous. In particular, any f e 
BV[a,b] has at most countably many points of jump discontinuity. 

Corollary 13.8. C BV[a,b] C S[a,b], where the closure is taken in 

B[a,b], 

If we improve our first estimate (13.1), we will likewise improve our first corollary. 

Theorem 13.9. Fix f € BV[a, b ], and let v(x) = Vf f. Then, f is right (left) 
continuous at x in [a, b] if and only if v is right (left) continuous at x. 

proof. One direction is easy. If x < y, then u(y) - v(x) > |/(y) - /(jt)|; hence, 
by taking limits as y -*■ x or as x -+ y, we get t>(x+) - v(x) > |/(x+) - f(x)\ 
and t>(y) - v(y-) > \f(y) - f(y—)\. Thus, if v is right (left) continuous atx, then 
so is /. 

Next suppose that / is, say, right continuous at x, where a < x < b. Then, 
given e > 0, there is some <5 > 0 such that |/(x) - f(t)\ < ell whenever x < t < 
x + 5. 

For this same e, choose a partition P of [ x, b ] such that V?f -e/2 < V(f, P). 
(How?) Now, since V(f, P) would increase only by adding more points to P, 
we might as well assume that P = [x = to < t\ < • < t„ = b) satisfies 

x < t\ < x + S. Then 

V b x f - e/2 < V(f,P) 

= l/(x)-/(t,)| + V(f,{t\ t „ }) 
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That is, e > V ’ x / — V* = V‘ x x f = v(fi) — v(x) > 0, for any x < t\ < x + S. So, v 
is right-continuous at x, too. □ 

Corollary 13.10. / € C[a,b]D BV[a,b] if and only if f can be written as the 
difference of two increasing continuous functions. 


EXERCISES 

15. Show that / 6 C[ a, b ] fl B V[ a, b ] if and only if / can be written as the 
difference of two strictly increasing continuous functions. 

16. Given f e BV[a,b ], define g(x) = /(jt+)fora < x < b and g(b) = fib). 
Prove that g is right continuous and of bounded variation on [a.b]. 


From our investigations into the structure of monotone functions in Chapter 1\vo 
(see Exercise 2.36) it follows that each function of bounded variation can be written as 
the sum of a continuous function of bounded variation plus a saltus, or “pure jump,” 
function. Specifically, let / 6 BV[a,b], and let (*„) be an enumeration of the discon- 
tinuities of /. For each n, let a„ = f(x„) - f(x„~) and b„ = /(*„+) - f(x n ) be the 
left and right “jumps” in the graph of /, where a„ = 0 if x„ = a and b n = 0 if = b. 
Since / is of bounded variation, it follows that £21 1 \ a n\ < oo and |Z>„| < oo. 
(Why?) We obtain the “continuous part” of / by subtracting these jumps. To simplify 
our notation, we will define two auxiliary functions: 


Kx) = 


if x < 0 
if x > 0 


and 


J(x) = 


0 

1 


if x < 0 
if x > 0. 


Now, let h(x) = 51“ | a„l(x - x n ) -1- b„J{x - x„), and let g = f-h. From 
Exercise 14, h is of bounded variation, and hence so is g. Moreover, from Exercise 
2.36, g is actually continuous. By design, / = g + h. 

Returning to our discussion of Jordan’s theorem, notice that the decomposition of a 
function of bounded variation into the difference of increasing functions is by no means 
unique: / = g - h = (g + 1) - (h + 1). By making a clever choice, however, we can 
instill a certain amount of uniqueness into the decomposition. 

Given / € BV[a,b\ and v(x) = Vf /, we define the positive variation of / by 

p(x) = i(v(x) + /(*) - /(a)) 


and the negative variation of / by 


n(x) = |(u(x) - f{x) f(a)). 

Obviously, u(x) = p{x) + n(x) and fix) = fia) + pix) — nix). We will show that p 
and n are increasing, thus giving an alternate representation of / as the difference of 
increasing functions. 
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Proposition 13.11. Let f € BV[a,b], and let v, p, and n be defined as above. 
Then: 

(i) 0 < p < v and 0 < n < v. 

(ii) p and n are increasing functions on [a,b], 

(iii) If g and h are increasing functions on [a,b] such that f = g — h, then 
V? p < Vfg and Vfn < Vfh for all x < y in [a,b]. 

proof. We will prove (i) and (ii) and leave (iii) as an exercise. The point to (iii) 
is that p and n give, in a sense, a minimal decomposition of /. To prove (i), recall 
that 


vto = Kf > i/oo - m\ > ±(f(x) - f(a)). 

Thus, p > 0 and n > 0. Since p + n = v, we must also have p < v and n < v. 

To see that p is increasing, we essentially repeat this calculation. Take x < y 
in [ a, b ] and notice that 

2(piy) - p(x)) = v(y) - v(x) + fiy) - fix) 

= Vff + f{y) - f(x) 

> I f(y) - /(*)! + f(y) - fix) > o. 


And similarly for n. □ 

Since / - /(a) = p - n, it follows that V* / = V*if - fia)) < Vfp + V*n. We 
have taken “the” choice of p and n that give equality here: 

V a b p + V a b n = pib) + nib) = vib) = V b f 

since p and n are increasing and vanish at a. Notice, too, that this gives || / 1| av = 
|/(u)| + pib) + nib). We can use this fact to clean up an earlier, less than satisfactory 
estimate. 

Proposition 13.12. ||/,/ 2 || B v < ll/illav WfiWev. 

proof. Write f\ = p\ - n\ + f\ia) and f 2 = P 2 - n 2 + f 2 ia), as in Proposi- 
tion 13.11. As pointed out above, this yields ||/i|Ibv = l/i(a)l + P\(b) + n t ib) 
and H/ 2 II 0 V = |/ 2 (a)| + P 2 ib) + n 2 ib). Next, write 

f\f 2 = P\P 2 + n\n 2 + f\ia)p 2 + f 2 ia)p\ 

-n\p 2 -n 2 p x - f\ia)n 2 - f 2 ia)n t 
+ / \ia)f 2 ia). 

Each term save the last (a constant) is a monotone function vanishing at a. Finally, 
we apply the triangle inequality in B V[a, b ]: 

ll/./ 2 ll*v = V‘(/,/ 2 ) + |/,(a)| |/ 2 (a)| 

< VZiP\Pi) + V b in t n 2 ) + • • • + Vfif 2 ia)n l ) + |/i(a)| |/ 2 (a)| 
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= P\(b) P 2 (b) + n x (b)n 2 (b) + \f\(a)\ p 2 {b) + \f 2 (a)\ p\{b) 

+ n\(b) p 2 (b) + n 2 (b) p\(b) + |/i(a)| n 2 (b) + |/ 2 (a)| M*) 

+ l/i(a)| l/2(«)l 

= ( pi(b) + «i(*) + |/i(a)| )( p 2 (b) + n 2 (b) + \f 2 (a)\ ) 

= \\MBv\\f2\\BV. Phew! □ 


EXERCISES 

17. Prove part (iii) of Proposition 13. 1 1 . [Hint: If / = g — h, then Vf f < Vfg 4- 
V?h=g(y)-g(x) + h(y)-h(x).] 

18. In the notation of Proposition 13.11, show that each point of continuity for / is 
also a point of continuity for both p and n. 

19. Suppose that / has a continuous derivative on [ a, b ]. 

(a) Use the mean value theorem to show that V(f, P ) can be written as a 
Riemann sum for \f '| over P. 

(b) Show that V a b f = f a b |/ '(01 dt. 

(c) Conclude that p(x) = f b [f '} + (t)dt and n(x) = f b [f '}“ (t)dt, where 
{/ '} + and {/ '} are the positive and negative parts of / 


If f '(/) = (jc '(r), y '(0) is continuous on [ a, b ], it follows from Exercise 19 that f is 
then a rectifiable curve and its length is given by a Riemann integral: V'j’f = f b ||f '|| 2 dt. 
In the parlance of calculus, ds/dt = ||f '|| 2 defines the speed of a particle traveling along 
the path f, and V’j’f is the total distance traveled by the particle from time a to time b. 

One of our goals is to make sense out of the formula in Exercise 19 in the case where 
/ ' is not continuous, or fails to exist at several points, or, for that matter, fails to be 
Riemann integrable. But this raises two big questions: If / is of bounded variation, 
does / ' exist at enough points to at least be integrable? And what does it mean for a 
function to be integrable anyway? Our first attempt to salvage the formula will be to 
write it in the “differential” form f b \df(t)\, and, for this to make sense, we will need 
more detailed information about integrals. 


Helly’s First Theorem 

Next we present a compactness result, of sorts, for BV[a,b] that will prove useful in 
the next chapter (where we will also meet Helly’s Second Theorem). We begin with 
two lemmas of independent interest. The first of these we have already encountered 
informally; the technique involved is sometimes called diagonalization. 

Helly’s Selection Principle 13.13. Let (/„) be a uniformly bounded sequence of 
real-valued functions defined on a set X, and let D be any countable subset of X. 
Then, there is a subsequence of (/„) that converges pointwise on D. 
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proof. Suppose that | f„(x)\ < K for all n and all x € X, and let D = {** : k > 1}. 
Then, in particular, since the sequence (/„(*i)) is bounded, we can pass to a 
subsequence (/„ (,) ) of (/„) such that (/„ (,) (xi)) converges. 

But now the sequence (fj l) (x 2 )) is also bounded, so we can pass to a subse- 
quence (/„ <2) ) of (/„ (1) ) such that (f^ 2) (xi)) converges. Since we have taken care 
to choose a subsequence of (/ n (l) ), we also have that (/„ ( 2 ) (*i)) converges. 

Next, since (/ n < 2 ) C* 3 )) is bounded, we can find a further subsequence (/„ (3) ) of 
(/ n (2) ) such that (/ n ( 3 ) C* 3 )) converges. We necessarily also have that (/„ < 3 ) (x 2 )) and 
(/„ < 3 , (*i)) converge. By induction, we can find a subsequence (/„ (m+l) ) of (/„ (m) ) 
such that (/< m+ 1 , M n °° =1 converges for each k — 1 , 2 , .... m + 1 . 

The claim is that the “diagonal” sequence converges for every 

k. Why? Because, for any k , the tail sequence (/ n (n) (x*))°! =i is a subsequence of 

(/?«,• □ 

The following lemma should remind you of our technique for extending the definition 
of the Cantor function. 

Lemma 13.14. Let D be a subset of[a,b] with a e D and b = sup D. If f : 

D -*■ R is increasing, then f extends to an increasing function on all of [a, b], 

proof. For x € [a, b], define g(x) = sup{/(f) : a < t < x, t e D }. It is 
immediate that g is increasing and that g(x) = f(x) whenever x e D. □ 

We next apply these results to a sequence of increasing functions on an interval 
[a,b]. 

Lemma 13.15. If (/„) is a uniformly bounded sequence of increasing functions 
on[a,b], that is, if \f„(x)\ < K for all n and all x in [ a, b ], then some subsequence 
of ( /„ ) converges pointwise to an increasing function f on [a, b] ( which also 
satisfies |/(je)| < K). 

proof. Let D be the set of all rationals in [a,b] together with the point a, 
if a is irrational. By applying Helly’s Selection Principle to the sequence (/„) 
and the countable set D, there is a subsequence (/„,) of (/„) such that <p(x) = 
lim*_.oo /„, (x) exists for all x e D. It is easy to see that this defines <p as an 
increasing function on D. By Lemma 13.14, we may suppose that <p has been 
extended to an increasing function on all of [ a, b ]. 

We next show that <p(x) = lim*_oc fn k (x) at any point x where <p is continuous. 
Given such an x and e > 0, choose rationals p and <7 in [a, b] such that p < x < q 
and <p(q) - <p(p) < e/2. Then, for all k sufficiently large, we have 

fk(x) < f k (q) < <p(q) + e/2 < <p(x) + e, 

and, similarly, f k (x) > <p(x) - e. Thus, <p(x) = lim*_oo f n fx) for any x i D(<p), 
the set of discontinuities of <p. 

Since <p is increasing, D(<p) is at most countable. Now here comes the clincher! 
Apply Helly’s Selection Principle again, this time using the sequence (/„,) and 
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the countable set D(<p). We choose a further subsequence of (/„*), which we again 
label (/„, ), such that lim*_oo f„ k (a:) exists for all x e D(<p) and, hence, for all x in 
[a, b]. If we set f(x) = lim*_,oc /„,(at), then / is clearly increasing. □ 

Finally, we are ready to apply these techniques to BV[a,b\. 

Helly’s First Theorem 13.16. Let (/„) be a bounded sequence in BV[a,b]; 
that is, suppose that H/nllav < K for all n. Then, some subsequence of (/„) 
converges pointwise on [a,b] to a function f € BV[a,b] ( which also satisfies 

\\f\\ B V<K). 

proof. First, note that since \\f n \\oo < II/JIbv < K for all n , the sequence (/„) 
is uniformly bounded. Next, if we write v„ (x) = Vff n , then \v.W\ < V a b f n < 

K and | ( jc ) - f„(x)\ < 2 K for all n. That is, (/„) is the difference of two 
uniformly bounded sequences of increasing functions, (u„) and (v„ - /„). By 
repeated application of Lemma 13.15, we can find a common subsequence (n*) 
such that both $( a :) = lim*^ v„ t (x) and h(x) = lim*,,^ (u„, (x) - f„fx)) exist 
at each point x in [a, b ]. (How?) It is easy to see that g and h are increasing 
functions and, hence, that / = g - h is of bounded variation. Of course, f(x) = 
lim^oo ( at ) for all x in [a, b]. Finally, it follows from Exercise 1 1 that ||/|| b »/ < 

K. □ 

Helly’s theorem is something of a compactness result in that it provides a conver- 
gent subsequence for any bounded sequence in BV[a,b ]. Unfortunately, the conver- 
gence here is pointwise and not necessarily convergence in the metric of BV[a,b] 
(recall that convergence in B V[a. b ] is even harder to come by than uniform conver- 
gence). 


Notes and Remarks 

According to Lakatos [1976], functions of bounded variation were discovered by 
Camille Jordan through a “critical re-examination” of Dirichlet’s famous flawed proof 
that arbitrary functions can be represented by Fourier series; see Jordan [1881]. It was 
Jordan who gave the characterization of such functions as differences of increasing 
functions (Corollary 13.6), but, as pointed out by Hawkins [1970], the key observa- 
tion that Dirichlet’s proof was valid for differences of increasing functions had already 
been made by du Bois-Reymond [1880]. The connection between rectifiable curves 
and functions of bounded variation is also due to Jordan and can be found in Jor- 
dan [1893]. Curiously, the representation of arc length by means of a definite integral 
was considered inappropriate and overly restrictive. As Hawkins puts it: “Success 
in this direction required a more flexible definition of the integral and the genius of 
Lebesgue.” 

The results in Exercise 6 are (essentially) due to Lebesgue; see Hobson [1927, Vol. I] 
and Lebesgue [1928]. The proof of Proposition 13.12 is taken from Kuller [1969], but 
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also see Bullen [1983] and Russell [1979]. Lemma 13.14 is taken from Lojasiewicz 
[1988]. 

Helly’s theorems can be found in Helly [1912], For more on saltus functions and 
Helly’s theorem (Theorem 13.16), see Natanson [1955, Vol. I] or Lojasiewicz [1988]. 
For more on Eduard Helly, the Austrian mathematician whose work had a profound 
influence on Riesz and Banach, see Hochstadt [1980] and a follow-up letter from Monna 
[1980], 



CHAPTER FOURTEEN 


The Riemann-Stieltjes Integral 


Weights and Measures 


Several times throughout this book we’ve hinted at a physical basis for some of our 
notation. It’s time that we made this more precise; a simple calculus problem will help 
explain. 

Consider a thin rod, or wire, positioned along the interval [ a, b ] on the x-axis and 
having a nonuniform distribution of mass. For example, the rod might vary slightly in 
thickness or in density (mass per unit length) as x varies. Our job is to compute the 
density (at a point) as a function fix), if at all possible. 

What we can measure effectively is the distribution of mass along the rod. That 
is, we can easily measure the mass of any segment of the rod, and so we know the 
mass of the segment lying along the interval [ a, x ] as a function F(x). Said in slightly 
different terms, we are able to measure small, discrete “chunks” of mass as dm = 
Fix + dx) - F(x) = dF, and so we’re led to define the density fix) = dm/dx = F'(jc) 
as the derivative of the distribution Fix), provided that F is differentiable, of course. 

But F is an arbitrary increasing function - is every such function differentiable? And, 
if not, can we say anything meaningful about this problem? Could we, for example, 
still find the center of mass (the line x = (i through which the rod balances) when F is 
not differentiable? 

As it happens, most of what we need to know about the rod, from a physical stand- 
point, depends not on differentiation but on integration. And integrals are easier to come 
by than derivatives. To see this, let’s simply use the pure formalism of first calculus 
and continue to write dF as the mass of a small “chunk” of the rod. Given this, the 
total mass is then m = f b d Fix) = Fib ) — Fia). And, as you might recall, we can also 
compute various moments as integrals, too: 

1 f b 

H= — I xdFix) (center of mass). 


a 


2 


- fix - nfdFix) 
m J a 


(moment of inertia about /i), 


and so on. We might even want to consider various measurements (p and compute 
expressions such as 

<p(x) dF(x) (expected value of <p). 
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In other words, the claim here is that it is possible to make sense out of these “gener- 
alized” Riemann integrals without making any assumptions on the differentiability of 
F. If, however, F should have a density (i.e., if F' exists), then we would want our new 
integral to be consistent with the Riemann integral. In this case, we would expect to 
have 

f <p(x)dF(x) = f <p(x)F'(x)dx. 

J a J a 

In particular, we will see to it that the case F(x) = x leads to the Riemann integral. 

There are several issues at hand here. First, given an arbitrary increasing function F 
on [a, b ], we will attack the problem of interpreting integrals of the form f a <p(x) dF(x). 
It won’t surprise you to learn that we will define this new integral as the limit, in some 
appropriate sense, of Riemann-type sums of the form y>(f, )[F(jc, )- F(jtj_i )]. What 

we will have, if we are careful, is a generalization of the Riemann integral. What may 
surprise you, though, is that there are a number of reasonable ways to accomplish this. 
Our first attempt at extending the integral will by no means be the most general, but it 
will suffice for now. 

Next we will take up the more difficult question of when (or if) our new integral is 
actually a Riemann integral. For this we will want to know whether F is differentiable 
and, if so, whether F' is Riemann integrable. The answer, as we will see, lies in further 
refining the Riemann integral. In short, we will generalize our generalization. First 
things first, though. 


The Riemann-Stieltjes Integral 

We begin by fixing our notation. Throughout this section, we consider a nonconstant 
increasing function a : [a,b] -*■ R and a bounded function / : [a, b] -*■ R (the 
function a is our “distribution” or “weight,” F, and / is our “measurement,” <p). We 
next set up the notation necessary to define the Riemann-Stieltjes integral /* / da. 

Given a partition P — [a = x o < *i < • • • < x„ = b) of [a, b], we write Aa, = 
a(xj) - a(xj-i), for i = 1 ,...,«. Note that Aa/ > 0 for all i, and that Aa, = 
a(b) - a(a). Next, for each i = 1 ..... n, we define 

nn = inf{/(x) : x,_i < x < *,}, 

Mi = sup{/(x) : x/_i < x < Xi }. 


We will also need 


m = inf{/(x) : a < x < b) = min(mi, . . . , m„}, 

M — sup{/(x) : a < x < b) = max{A/|, .... M„). 

Note that m < m, < A/, < M for any i = 1 n. 

We define the lower Riemann-Stieltjes sum of / over. P, with respect to a, by 
L(f, P) = wi/Aa/, and the upper Riemann-Stieltjes sum of / over F, with 
respect to a, by (/(/, P) = A/, Aa, . If we should need to refer to a, we will write 
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L a (f P) and £/<*(/, P )• For the time being at least, we will think of a as fixed and so 
ignore several of these additional quantifiers; we will refer to L(/, P) and U(f P) as 
simply a lower sum and an upper sum. Clearly, L(/, P) < U(f , P) for any partition P. 
Notice, too, that L(- /, P) = -(/(/, P). 

As you would imagine, we want to take “limits” of upper and lower sums to define 
our new integral. A few simple observations will clarify the process. 

Proposition 14.1. If P c Q are partitions of [a, b ], then L(f % P) < L(f Q ) and 
U(f Q) < U(f P). 

proof. We first prove the inequality concerning lower sums. By induction (on 
the number of elements of Q \ P) it is enough to consider the case Q = P U {* 1 }, 
and for this it is enough to establish L(f { a , b}) < L(f { a , * 1 , b}). (Why?) Now, 
if we set mi = inf{/(jc ) : a < x < x\] and m 2 = inf{/(jc) : x\ < x < b}> then 

L(/, { a , *}) = m[a(b) - a(a)] 

= /n[a(*i) - a(a)] + m[a(b) - of(jrj)] 

< mi[a(jri)-a(a)] + m 2 [of(£>)-af(jri)] 

= L(f,{a,x u b)). 

The proof for upper sums is similar but, since t/(/, P) = -L(- f, P ), it actually 
follows from what we have already shown. □ 

Corollary 14.2. L(f, P) < U (/, Q) for any partitions P, Q of[a,b ]. 

PROOF. L(f P) < L(f. PUQ)< U(f, PDQ)< U(f, Q). □ 


Here is where we stand: For any partitions P and Q, we have 

m[a(b) - «(a)] < L(f P) < U(f , Q) < M[a(b) - a(a)]. 

As we increase the number of points in our partition, the lower sums increase while 
the upper sums decrease. Thus we are led to consider the lower Riemann-Stieltjes 
integral of / with respect to a over [ a, b ] defined by 

f fda = sup L(/, P) 

Ja P 


and the upper Riemann-Stieltjes integral of / with respect to or over [ a, b ] defined 
by 



= inf U(f, P). 


pb rb 

m[a(b) — a(tf)] </ / da < I f da < M [a(b) — <*( 0 )]. 
Ja Ja 


Clearly, 
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If the upper and lower integrals of / should agree, then we say that / is Riemann- 
Stieltjes integrable with respect to a over [ a, b ], and we define the Riemann-Stieltjes 
integral of / with respect to a over [ a, b 1 to be their common value 


[ h f(x)da(x) = f f da = f b f da = f f da. 
Ja Ja J a Ja 


When or(jr) = x, this definition yields the Riemann integral of / over [a,b]. In this 
case, we will use the familiar notation f* f(x)dx or, occasionally, just f* f. 


Examples 14J 

(a) If /(*) = c is a constant function, then / is Riemann-Stieltjes integrable with 
respect to every increasing a and f* f da = c[or(6) - a(a)]. (Why?) Likewise, 
if a is constant, then every bounded function is integrable with respect to or - 
but, of course, f* f da = 0 for any /. Not very interesting. Unless we need to 
specifically consider this trivial case, we will always assume that a is noncon- 
stant. 

(b) In general, not every bounded function is integrable. For example, xq < s 
not Riemann integrable on any interval [a, ft]. To see this, just check that 
U(Xq< P) = b — a and L(x q, P) = 0 for any partition P of [a, ft]. That is, 

Xq = b — a while /*xq = 0- Essentially the same argument shows that xq is 
not integrable with respect to any (nonconstant) increasing or. 

(c) A simple example of a Stieltjes integral, although not precisely of the type we 
have defined, is provided by a contour integral, or line integral. Such integrals 
are frequently used in complex analysis and might be written f r f(z)dz, where 
T is a curve in the complex plane. If y(/), a < t < b, is a parameterization of T, 
then we would write 

^/(z)d2 = jf f{Y(t))d(y(t)). 

In practice, of course, the contours that are actually used are often very simple. 
For instance, if T is the circle of radius r about 0, then y(t) = re", and our 
contour integral reduces to the Riemann integral f(re'')rie i 'dt. In full 
generality, though, y(t) need not be everywhere differentiable, and so the generic 
contour integral is necessarily a Stieltjes integral. 

We write a, b ] to denote the collection of all bounded functions on [ a, b ] which 
are Riemann-Stieltjes integrable with respect to or. When <*(*) = x, we simply write 
R[a, b] for the space of Riemann integrable functions on [a,b\. In any case, notice 
that (by definition) H a [a, b] c B[a,b]. 

As you might imagine, we will eventually check that TL a [a,b] is a vector space, 
an algebra, a lattice, a normed space, and so on. To begin, though, we need a simple 
criterion for Riemann-Stieltjes integrability. 


Theorem 14.4. (Riemann ’s Condition) Let a : [ a,b ] — ► R be increasing. A 
bounded function f : [a. b] — *• R is in lZ a [a. b] if and only if, given e > 0, there 
exists a partition P of [a, b 1 such that U(f, P) — L(f , P) < e. 
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proof. First, suppose that / € Tla[a,b] and let / = f b f da. Given e > 0, 
choose partitions P and Qoi[a,b] such that / - e/2 < L(f, P) and U(f, Q) < 
I + e/2. The partition P* = P U Q will do the trick: 

t/(/, P*) <U(f,Q)<I + £ -< L(f, P) + e< L(f, P*) + e. 

Next, suppose that for every e > 0 there is a partition P for which t/(/, P) - 
L(f, P) < e. Then, since 


L(f,P)< 



f da < U(f, P) 


for any partition P, we must have f b f da — f b f da < e for every e > 0. That is, 
J b fda = f b fda. □ “ 


Riemann’s condition makes short work of checking that continuous functions are 
integrable with respect to any increasing integrator. 


Theorem 14.5. C[a,b\C’R a [a, b]for any increasing a. 


proof. Let / : [a, fc] -*■ R be continuous and let e > 0. Then, since /is actually 
uniformly continuous, we may choose a 8 > 0 so that | fix) — f (y)| < e whenever 
|jc - y| < 8 . Now if P is any partition of [ a, b ] with jc, - x,_i <8 for all /, then 
Mj -nti < e for all i and hence 

n 

U(f, . P) - L(f, P) = £(Af, - m,)Aa, 

1 = 1 
n 

< e ^ Aa, = e [or(h) — of(a)]. □ 

i=i 


EXERCISES 

> 1. If f,g eTl a [ a, b] with / < g, show that /*/ da <f b gda. 

2. If /, g € Ua[a, b ], show that / + g 6 Ha[a, b ] and that f b (f + g)da = 

fa f da + fa 8 da - 

3. If / e T^a[a, b], show that |/| € Tl a [a, b] and that | f b f da\ < f b \f\da. 
[Hint: t/(|/|, P) - L(\f\, P) < U(f , P) - L(/, P). Why?] 

4. If f,g € ^[a.fej.is fg € H a [ a, b]l How about / 2 ? 

5. Give an example where / 2 € h] but / £ Ha[a. b ]. 

> 6. Define increasing functions a, fi, and y on [— 1, 1 ] by a = x«u ]. P = Xio.i ]- 
and y = |(a + fi). Given / € B[— 1, 1 ], show that: 

(a) / e H a [-l, 1 ] if and only if /(0+) = /( 0). 

(b) / € n p [- 1, 1 ] if and only if /( 0-) = /( 0). 

(c) / € Hy [— 1 , 1 ] if and only if / is continuous at 0. 

(d) If / e Hyl- 1, 1 ], then fda = f'_ x fd() = /!, fdy = /( 0). 
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7. Let P = {jto, . . . , x n } be a (fixed) partition of [ a, b ], and let a be an increasing 
step function on [ a , b ] that is constant on each of the open intervals (jc, _ i , Xj ) and has 
jumps of size a, = a(jC|+) — a(xj-) at each of the x l9 where cto = a(a+) — a(a) 
and a„ = a(b) — a(b — ). If / 6 B[ a y b ] is continuous at each of the x t , show that 
/ € U a and £ / da = E"=o /U K - 

8. If / is continuous on [ 1, n ], compute /" f{x)d[x], where [jc] is the greatest 
integer in x. What is f[ f{x)d[ jc] if t is not an integer? 

9. If / is monotone and a is continuous (and still increasing), show that / € 
Ua[a,b]. 


As a second application of Riemann’s condition, we can now supply an integral 
formula for the total variation in at least one simple case. 

Theorem 14.6. Suppose that f exists and is Riemann integrable on [ a , b ]. Then, 
f e BV[a,b)and V a b f = £ \f'(t)\dt. 

proof. First note that / is continuous on [a, b\. Thus, given a partition P of 
[a, b ], we can appeal to the mean value theorem and write 

n n 

v(f, P) = J2 !/<*») - = E I /'('■)! A *<- 

i=i i=i 

where r,- € (jc,_i, jc,) for each /. Consequently, 

L(\n P) < V(f, P) < U(\f'\, P). 

Since |/'| is Riemann integrable (see Exercise 3), it follows that / is of bounded 
variation and that V* / = £ |/'(f)l dt. □ 

We can rephrase Riemann’s condition to look more like the definition of a limit. 
Indeed, since 

U(f, P ) - L(f, P ) < U(f , P*) - L(f, />*) for all P D P *, 

we can say that / € Ha[a,b] if and only if, for each n, there is some partition 
P„ such that t/(/, P) — L(f, P) < (1/n) for all refinements P D P„, that is, for all 
partitions “beyond” P„. And we might as well assume that P n+ i d P„ for all n. Thus, 
/ € Tltt[a , b ] if and only if U(f, P n ) - L(f, P„) -* 0 for some increasing sequence of 
partitions P\ c Pi C • • •• In short, if / € TZal a , b ], then Riemann’s condition supplies 
a particular selection of points from [ a, b ] that refine our upper and lower estimates 
for the integral. In this case, L(/, P„) increases to £ f da while U(f, P„) decreases to 

fa f 

Riemann’s condition not only supplies a simple criterion to test for integrability, it 
also tells us exactly which functions fail to be integrable. To see this, let / be a bounded 
function on [ a, b ], let P = (x 0 , ...,*„) be a partition of ( a, b ], and write the difference 

Mi -mi = sup / - inf / = a>(/;[x,_i, jc, ]) 
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as the oscillation of / over [ jt,_i , x ]. Thus, 


(/(/, P) - L(/, P) = [M; - m,][ar(*,) - a(x,_,)] 

i—\ 

n 

= T2 co ^ ; t •*'-> - *< i) a, ( a; i */-i ■ *i i) 

i=i 

n 

- ]C °^f ; (•*< - 1 ’ *i )> ; (*• - 1 • *« w 

i = ! 

> forjc £ P. 

In order that / € TZ a [a y b] y then, we must have (o/(x)a) a (x) = 0 for “most” values 
of jc. In particular, if / and a share a common one-sided discontinuity, say both are 
discontinuous from the right at x g [ a, b ], then / will fail to be integrable with respect 
to a. (See Exercise 6 for several specific examples.) 


EXERCISES 

o 10. If / G , b ], show that / € TZ a [c y d ] for every subinterval [ c, d ] of 

[ a, b ]. Moreover, f* f da = f* f da + f* f da for every a < c < b. In fact, if 

any two of the these integrals exist, then so does the third and the equation above still 
holds. 

> 11. If / G 7 Z a [a, b] with m < f < Af, show that f* f da — c[a(b) — a(a)] 
for some c between m and M. If / is continuous, show that c = f ( x 0 ) for some x 0 . 

12. Given / € 7 Z a [a y b] y define F(x) = f* f da for a < x < b. Show that 
F € B V[ a y b ]. If a is continuous, show that F G C[ a, b ]. 

13. it/; / da = 0 for every / G C[a,b ], show that or is constant. 

> 14. If / € 7£ 0 [a , b] y and if U(f , P) — L(/, P) < e for some partition P, show 

that | J2"=\ - /* fda\ < s, where ti is any point in [ jc,_i , jc, ]. 

15. Suppose there exists a number I with the property that, given any e > 0, there is 

a partition P such that | , /(/, )Aa, — / 1 < e, where ti is any point in [ jc,_i , x, ]. 

Show that f e TZ a [a y b] and / = f*f da. 

16. If U(f y P) — L(/, P) < £, show that £" =1 | /(/,) — /(s,)|Aor, < e for any 
choice of points s, , ti G [ jc,_i , Xi ]. 

> 17. If / and a share a common-sided discontinuity in [ a, b ], show that / is not in 
Uala.b]. 

18. Show that f] {7 Z a [a y b] : a increasing} = C[a y b ]. 

19. If K a [ a y b ] D S[ a, b ], show that a is continuous. 

20. If a is continuous, show that f* f da does not depend on the values of / at 
any finite number of points. Is this still true if we change “finite” to “countable”? 
Explain. 
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21. Given a sequence ( x n ) of distinct points in (a , b) and a sequence ( c n ) of positive 

numbers with c n < define an increasing function a on [a, b] by setting 
a(x) = c nl(* — *„), where /(jc) = 0 for x <0 and l(x) = 1 for x > 0. 

Show that f da = c nf(x n ) for every continuous function / on [a,b]. 

[Hint: Given e > 0, take N sufficiently large so that f3(x) = Y^Ln + i c nl( x — x n ) 
satisfies /3(b) — 13(a) < e .] 

22. If / € 7l a [ a, b ] with m < / < M, and if <p is continuous on [ m, M ], show 
that cp o f € 7 Z a [a, b ]. 

23. Suppose that (p is a strictly increasing continuous function from [c, d] onto 

[a, b]. Given / € fe], show that g = / o cp e Tlp[c , ], where f3 = a o <p. 

Moreover, // gdfi = f* f da. 

24. As we have seen, Xq is not Riemann integrable on [ 0, 1 ]. The problem is that 
Xq is “too discontinuous” But what might that mean? Here is another example with 
uncountably many points of discontinuity, but this time Riemann integrable: Show 
that the set of discontinuities of Xa is precisely A (an uncountable set), but that Xa is 
nevertheless Riemann integrable on [ 0, 1 ]. [Hint: A can be covered by finitely many 
intervals of arbitrarily small total length.] 


The Space of Integrable Functions 

In this section we will examine the algebraic structure of the space of integrable func- 
tions TZ a [a, b] y where a is increasing. As you might imagine, this examination will 
reduce to a study of certain elementary properties of the integral. Most of these prop- 
erties are both easy to guess and easy to check. For this reason, we will relegate many 
of the details to the exercises. On the other hand, whereas some accounts give these 
elementary properties as corollaries of a “metatheorem” we will give (or at least sketch) 
direct proofs wherever possible. 

To begin, let’s check that 7^ a [a, b ] is a vector space, a lattice, and an algebra! 

Theorem 14.7. Let /, g e [ a, b ] and let c € R. Then: 

(i) cfeH a [a,b]andtfcfda = cf!;fda. 

(ii) f + g eK. a [a,b] and f*(f + g) da = f da + g g da. 

(iii) / a b / da < f* g da whenever f < g. 

(iv) |/| e n a [a,b] and \ f da\ < f^\f\da < ||/||oo[a(6) - a(a)]. 

(v) fg € 1l a [a, b ] and \ /* fg da | < ( /j* f 2 da)' ,2 ( £ g 2 da)' /2 . 

proof, (i): If c > 0, then clearly U(cf, P ) = cU(f, P ), and similarly for lower 

sums. If, however, c < 0, then 


U(cf, P) = |c| U(-f, P) - -\c\ Uf, P) = cL(/, P). 



222 


The Riemann-Stieltjes Integral 


(Why?) Again, the lower sum version is similar. In either case we get 

U(cf, P) - L(cf, P ) = \c\ [(/(/, P) - L(f , P)] t 

and this should be enough to convince you that cf e 1Z a [a,b]. Now, for the 
equality of integrals, notice that 



cf da — c 



= c 



if c > 0 
if c < 0. 


(ii): Consider the following rather strange looking claim: 


L(/,P) + L(g, Q) < L(/ + g, PU g) 

<U(f + g,PUQ)< U(f P ) + U(g, Q). 


(Why does this work?) Since we are allowed to make independent choices of P 
and 0, we can easily force P U Q to “work” for / 4- g- Thus, / 4- g € IZ a [ a y b ]. 
And how about the integrals? Well, it follows from our claim that 


pb pb pb 

I fda+ / gda< / ( f + g)da 
Ja Ja Ja 

pb pb pb 

< / ( f + g)da< / fda+ / gda. 
Ja Ja Ja 


The proof of (iii) is left as an exercise (see Exercise 1). 

(iv): From the triangle inequality, 1 1 /(^)| - \f(t)\ | < \f(s) - f(t) |, and so it 
follows that co(\f\; /) < co(f ; /) for any interval /. In particular, 


U(\fl P) - L(\fl P) < U(f P) - L(/, P). 

Hence, |/| e 1 Z a [a,b], Since -/, / < |/| < ||/||oo> the integral inequality 
follows from (i) and (iii). 

(v): We first show that / 2 el Z a [a,b]. Indeed, since 

fix ) 2 - fiy ) 2 = {fix) + fiy)){fix) - f(y)), 

we have co{f 2 \ I) < 211/Hoo co(f; 1) for any interval 7. Consequently, 

U(f 2 , P) - L(f 2 , P) < 2||/|| 00 [(/(/, P) - L(f , P)]. 

Thus, / 2 € H a [a,b] whenever / e 7 Z a [a,b]. That 1Z a [a, b] is closed under 
more general products now follows from a little sleight of hand: 4/g = (/ + g) 2 — 
(/ ~ ^) 2 - Hence, by (i), (ii), and the first part of this proof, we have fg e K a [ a, b ]. 

Finally, the integral inequality follows from the Cauchy-Schwarz inequality 
for sums and Exercise 14. Since all three integrals in the inequality exist, we can 
find a single partition P = {jc 0 , . . . , x„] such that each integral is approximated. 
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to within a given e , by a finite sum of the form YH=\ where is any 

point in [ jc, _ i , jc, ]. Thus, 


pb n 

/ fgda - e 
Jo i = 1 

n __ 

i = l 

< ^X /fo > 2 A “< j (x ) 2 A«,^ 


-(L tfld “ +s ) (0 2da+ ‘) ■ 


1/2 


1/2 


□ 


Please note that Theorem 14.7 (v) need not hold for unbounded functions (or “im- 
proper” integrals). Indeed, the improper Riemann integral Jq(\/ <J x)dx exists, while 
/q(\/x)cIx does not. 

Theorem 14.7 tells us that 1Z a [a, b] is a vector space, an algebra, and a lattice; in 
fact, K a [ a, b ] is a subspace, a subalgebra, and a sublattice of B[a,b]. Moreover, there 
are at least two natural choices for a norm on 7Z a [ a, b ]. We might simply use the sup- 
norm, or we might want to consider ||/|| = \f\da. While the latter expression has 
most of the trademarks of a norm and will actually prove useful in certain settings, it 
falls just short of being a norm. It typically only defines a semi-norm (see Exercises 25 
and 26). 

For now, let’s establish at least one good reason to consider the sup-norm: 1Z a [ a, b ] 
is closed under uniform convergence. That is, 7l a [ a, b ] is a closed subspace of B[ a, b ] 
and so is complete under the sup-norm. 

Theorem 14.8. Let (/„) be a sequence in 7 Z a [a, b \ If(f„ ) converges uniformly 
to f on [a, b ], then f € 7 Z a [a, b ]. Moreover, f* f n da -» J f da. 

proof. Given e > 0, choose k such that \\f - / n ||oo < e whenever n > k. Now, 
since /* is integrable, we can find a partition P of [a, b] such that £/(/*, P ) — 
L(f k , P) < e. From this we want to estimate U(f P ) - L(/, P). 

Now for any pair of points s, t e [ a, b ], the triangle inequality gives \f(s) — 
f{t)\ < | fk(s) - fk(t) \ + 2e. It follows that co(f; I) < o)(f k ; /) + 2e for any interval 
/ C [a, b]. Consequently, 


U(f, P ) - L(f, P) = X «(/; t Xi-I . Xi ])A«, 
1=1 


n n 

< X <»(/*; [ *i-t . Xi ]) Aa, +2 e X Aa <' 

i = 1 i=l 

= £/(/*. P) - L(/ t . P) + 2s[a(b) - a(fl)] 

< e + 2e[a (b) — a(a)]. 
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Thus, since e is arbitrary, / e1Z a [a,b]. To see that f a f„da -*■ /* / da, we now 
just estimate 

| J (//i-/)</a|< J \fn~f\da 

< II fn ~ /lloo[a(*) - <*(«)] -* 0 as n -+ oo. □ 


Notice that C[a,b] is a subspace (as well as a subalgebra and a sublattice) of 
Ka [a,b] for a increasing. It follows from Theorem 14.8 that C[a,b] is closed in 
H a [ a, b ] when %„[ a, b ] is endowed with the sup-norm. 

On the other hand, if a is continuous, and if we supply H a [a,b] with the semi-norm 
ll/H = j * \f\da, then C[a,b] is a dense subspace of lZ a {a,b ]. 

Theorem 14.9. Let a be continuous and increasing. Given f e 1Z a [a,b) and 
e > 0, there exist 

(i) a step function hon[a,b] with ||/i H-o < ll/lloo such that /* \f - h\ da < e, 

and 

(ii) a continuous function g on [a, b] with ||g||oo < ll/ll OO such that f* |/ - 
g\da < e. 

proof. From Theorem 14.4, we can find a partition P = {jt 0 x„) such that 

n 

U(f, P ) - L(f P) = ^oKf Axi.t.Xi |)Aof, < e. 

i=l 

For each i = 1 n, choose /, e [ jc, _ i , jc, ) and define a step function h by 

setting h(x) = flti ) for jr,_i < x < x ,, for i = 1 n, and h(x„) = /(/„). 

Clearly, ||/i||oc 5 ll/lloo- Since a is continuous, we have h e Tl a [a, b]. From 
Exercise 10 it follows that 

f \f-h\da = ^2l \f(x)~ f(tj)\da(x) 

Ja | = 1 Jx ,-i 

n 

< y^a>(/;[Xf-i,x, ]) Aa/ 

1 = 1 

= t/(/, P) - L(f P) < e, 

which proves (i). 

To prove (ii), we use the fact that a is uniformly continuous. Since n is fixed, 

we may choose 0 < S < min{ Ax,/2 : i = 1 «} such that a has oscillation 

less than e/(n + 1) on each of the intervals [ x, - 5, x, + S ] n [a, b ]. Now let g be 
the polygonal function that agrees with h at each of the nodes 

jto, *o + S, *i-S, *|+S x n -8, x„. 

(Thus g is the piecewise linear continuous function that agrees with h on each of 
the intervals [ x, _ i -t-S, jc, - 8 ] and is linear on each of the intervals [ x, — 8, x t +5 ].) 
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Then, H*ll«> < Halloo < l/ll* (why?), g e 1l a [a,b 1, and 


/* xq +6 w-l /»** 

/ |A-g| da = I \h-g\da + 'Y2 \h-g\da+ \h-g\da 

Ja Jx o ; _i Jx,-8 Jx n ~S 


< 2||/||o 


+ ••• + 211/Hc 


n + 1 n + 1 

Finally, we use the triangle inequality to conclude that 


= 2e H/Hoc. 


f \f — g\da < f \f — h\da + f \h - g\da < e + 2e\\f\\r X> . □ 

Ja Ja Ja 


EXERCISES 

25. Construct a nonconstant increasing function a and a nonzero continuous func- 
tion / € "Raia, b ] such that \f\da = 0. Is it possible to choose a to also be 
continuous? Explain. 

26. If / is continuous on [ a,b ], and if f(x o) 0 for some x 0 , show that 

fa\f(x)\dx # 0. Conclude that ||/|| = |/(jc)| dx defines a norm on C[a, b ]. 

Does it define a norm on all of 7£[ a, b ]? Explain. 

27. Give an example of a sequence of Riemann integrable functions on [ 0, 1 ] that 
converges pointwise to a nonintegrable function. 


Integrators of Bounded Variation 

We next extend the definition of the Riemann-Stieltjes integral to accept integrators 
that are not necessarily increasing. In particular, we would like to use the difference of 
increasing weights, that is, functions of bounded variation. The only problem we face 
is that upper and lower sums will no longer be monotone. To generalize the integral, 
then, only requires that we take more general sums. 

Throughout this section, unless otherwise specified, / and a will denote arbitrary, 
bounded, real- valued functions on [ a, b ]. 

Given a partition P = {jc 0 x n ] of [a, b], let T = {r ( t„ \ denote an arbitrary 

selection of points from [a, b ] with /, e [xj_i, x t ]. We call 

S(/, P.T) = Y, /('<)[<*(*.) - or(*/-i)] 

i=i 

a Riemann-Stieltjes sum for /. If we need to display the dependence on a, we will 
write S a (f, P, T). 

In this general setting we say that / is Riemann-Stieltjes integrable with respect 
to a and write f e H„[a, b] if and only if there exists a number / € R such that, for 
every e > 0, there is a partition P* for which |S(/, P,T) — I \ < e for all refinements 
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P D P* and all selections of points T. If such a number I exists, then it is easy to see 
that it must also be unique; in this case, we define f* / da = /. If we should need to 
distinguish this integral from an integral arising from another definition, we will write 
(RS)fa f da. 

If a is increasing, then L(f, P) < S(f , P. T ) < U(f, P) for any P and any T. The fact 
that we have complete freedom in choosing the points T at which / is evaluated means 
that the sum S(f, P, T) can be made arbitrarily close to either L(/, P) or U(f, P) by 
choosing T appropriately. Given this, it is not hard to see that the refinement definition of 
the integral coincides with our earlier definition when a is increasing (see Exercises 14 
and 15). 

For nonincreasing integrators, though, no such simple comparison of sums is avail- 
able. If we permit Aa, to take on negative values, then we sacrifice the monotonicity of 
upper and lower sums. The more general Riemann-Stieltjes sums S(f, P, T) are needed 
in this case; the extra freedom in choosing T compensates for the lack of monotonicity 
of sums. 


EXERCISES 

>28. If a is increasing, show that the definition of the integral given above coincides 
with our previous definition (in terms of upper and lower sums). 

> 29. Show that | Sa(f, P, T)\ < \\f\\ 0o V(a, P )• 

30. If a is a step function (and not necessarily increasing) and / is continuous, 
derive a formula for (RS) f da. [Hint: See Exercise 7.] 

31. Let a < c < b, and suppose that / e 7 Z a [a,c] n 7 1„[c,b\. Show that 
/ G 7£ a [a, b ] and that f* f da = f' f da + f* f da. In fact, if any two of these 
integrals exist, then so does the third and the equation above still holds. 

32. If (RS) f* f da exists, and if a < c < b, does (RS) f' f da exist? [Hint: The 
answer is “yes,” but this is harder than the previous exercise.] 

33. If (RS) f* f da exists, show that it equals lim n _ 0o S(f , P„, T„) for some in- 
creasing sequence of partitions (P„) and any (T„). 

34. Just as with other limits, the refinement integral admits a “Cauchy criterion” 
for convergence: Show that / e 7£ tt [a, b ] if and only if, given e > 0, there is a 
partition P* such that |S(/, P \ , T \ ) — S(f, P 2 , T 2 ) \ < e for any pair of refinements 
P\. P 2 D P* and any T\,T 2 . [Hint: For the backward implication, choose a particular 
sequence of partitions for which S(f, P„,T„) converges to, say, I. Now show that I 
“works” in the definition of (RS) f a f da.] 

35. Let P = {*0 •*«} C [yo. • • • . )’m } = F* be partitions of [ a, b ]. Show 

that S(f, P, T) - S(f, P*, T*) = £7 =I [/(*,) - /(/;)] [a(y,) - «(?;_,)], where 
Sj = tj and tj are in the same interval [ *,_i , x, ]. [Hint: Draw a picture!] Use this to 
give a direct proof, based on Exercise 34, that C[a,b ] C 7?.„[a, b ] whenever a is 
of bounded variation. 
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Riemann-Stieltjes sums are easier to work with than you might suspect. For example, 
it is now quite easy to see that the integral is linear. Indeed, since the sums are linear, 
S(cf + dg % P, T) = cS(f P , T ) 4- dS(g, P, 7), we have 


S(cf + dg , P, 7) - 



f dot + d 



< kl S(f, P, T) — [” f da +\d\ 


S(f, P,T) — f g dot 


That is, / = c /J 7 / da + d f^g da “works” and so becomes the only possible value for 
f*(cf -f dg)da. Thus, 7£ a [a, b ] is at least a vector space. 

Absolute values and products will not be so easy to come by, though. Again, we need 
the integral to be monotone (more or less), and it is not necessarily going to cooperate. 
In fact, one of our goals is to find an upper estimate for | f^fda | in terms of ||/'|| 00 - 
This was simple for increasing weights a y but not so transparent in general. (Recall the 
proof of Theorem 14.7 (iv).) 

On the other hand, certain other properties of the integral are still with us. For 
example, it is not at all hard to show that the integral is also “linear in a.” That is, if 
/ G TZ a n then / g 7 l a ±p and /J 7 / d(a ± f) = /j 7 f da ± /J 7 / df. Rather than 
present several repetitious proofs, let’s settle all such issues at once. 


Theorem 14.10. (Integration by Parts) / e 7 l a [a,b] if and only if a g 
7lf[a,b] and , in either case , 

f fda+f adf = f(b)a(b) - f(a)a(a). 

J a J a 


proof. The “if and only if’ is a mirage! Since the statement is clearly symmetric 
in a and /, we need only establish the forward implication. So, suppose that / g 
Land lets > 0. Choose a partition P* so that \S a (f y P, 7) — f*fda\ < e 
for all P d P* and all 7. 

Fix P d P* and a selection of points 7. The idea is to write S/(a , P, 7) in 
terms of S a (f P', 7'), where P'dP (and hence P' D P*). First, 

n 

S/(a, P , T) = £>(r,) [/(*,) - /(*,_,)] 

/=1 

n n — 1 

= X] /(*/)«(*/) - X /(*/)“( f <+i) 

i=l i=0 

n 

= - X /to) - “( f i)] - /(^o)«(fo) + /(j: n )a(r„ + i), 

i=0 

where we have introduced to = a and t n+ \ — b (since a partition has to include a 
and b). That is, if we set P' = t n+ \ ) and T = P, then 


S f (a, P, T) = f{b)a(b) - f(a)ct(a ) - S a (f , P ' , T'), 
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which is almost what we want. We wanted P' d P, and this is easy to fix: 

n 

S a (f, P', T') = £/(*, ) [a(f i+ 1 ) ~ «(/,)] 


1=0 

n 


= f( x i ) [«(<;+!> - ate)] + ^2 /te) [ate) - a(t,)] 

i =0 

= S a (f P", T"), 


i= 0 


where P" = {x 0 , te • • •} D P and T" = {jcq, xo, xi,x\,...,x„,x„}. Hence, 


:«. f. t) - [ 

f 


fib)aib) - /(a)or(a) 


- [ b fda\ 

Ja J 


< S. 


f da — S a if P , T ) | 

That is, a € H f [ a, b ] and /* a df = /(h)a(h) - /(a)a(a) - /* / da. □ 

Now we just sit back and reap the benefits. 

Corollary 14.11. If f e 1Z tt n 7^, then / e ft 0±j8 and 

f fdia±p)= f f da ± f f dp. 

Ja Ja Ja 

Corollary 14.12. If f is monotone and a is continuous on [a,b], then f e 
n a [a,b]. 


Corollary 14.13. If a e BV[a , b], then C[a, b ] c 7 Z a [a, b}. Obversely , if a e 
C[a,b ], then BV[a,b] c lZ a [a,b\ In particular, continuous functions and 
functions of bounded variation are Riemann integrable on[a,b ]. 

proof. If a = p - y, where p and y are increasing, then 

C[a 9 b] C 7lp[a, b]CMZ Y [a, b] C 7lp- y [a , b] = 7^1 a, &]. □ 


We would like to go one step further in the proof of Corollary 14.13 and ask whether 
7lp[a,b] n 1ly[a,b] = Tl a [a,b]. This would truly reduce the study of bounded 
variation integrators to the case of increasing integrators. For example, since each of 
Up and IZy is closed under products, we would have that 7 Z a is closed under products, 
too. Unfortunately, the formula is not true for just any such splitting a — p - y (take 
a = 0 and ft = y, any nonconstant increasing function), but it is true for the canonical 
decomposition. 

Theorem 14.14. Let a e B V[a, b ], and let P(x) = V*a. ( Recall that both ft and 
P — a are increasing .) Then , 7 Z a [a,b] = 7^[a, ] n Hp- a [a, b]. 

proof. From Corollary 1 4. 11, it suffices to show that ll a [ a, b] c lZp[a,b]. So, 
let e > 0, and let / e 7£ a [a , b ]. 
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We first make an observation about a and p. Since a is of bounded variation, 
we may choose a partition P* so that p(b) - P(a) = Vj’or > V(a, P) > V£a - e 
for all partitions P D P*. That is, if P = {ato jc„) D P*, then 

0 < p(b) - p(d) - V(a, P) 

n n 

= )] - l«(*<) - Of(ATf_i)l 

1 = 1 l = l 

n 

= £{AA-|A«,|} <£. 

/=! 

Since / e H a [ a, b ], and since we are allowed to augment P*, we may assume 
that P* also satisfies |S 0 (/, P, T) — f* f da \ < e/2 for any P d P* and any T . 

In particular, 

| S a (f, P , T) - SAL P , T*)\ < e for any P D P* and any T, T*. 

Once P is fixed, we can force this difference to look like the difference of upper 
and lower sums for p by taking a suitable choice of T and T*. Specifically, given 
P and e > 0, choose T and T* so that 

n 

SAL P, T ) - SAf p, 7-*) = £[/(/,) - /(/,*)] Aa, 

i=l 

n 

> - m ‘ -£ )i Aa <i 

/= i 
n 

- YS Mi ~ m ')l A “' l " £ V a a - 
1 = 1 

(Please note the absolute values! Why does this work?) 

Combining these observations, we now compare UAL P) — ^Af< P) and 

SAL P, T ) - SAL P, T *): 

n 

UAL p ) - LAL P) = Y± Mi ~ m «) A A 

i = l 

n n 

= ~ "*<){ A ft ~ i Aa /i} + - "*t)i A «f i 

i=i i=i 

< 2B/||«,£ + SAL P’ n - SAL P> T*) + £V> 

< 2 ||/|| oo £ + £ + £ vfa. 

Thus, / € TZA^’b]. □ 

Corollary 14.15. If a e BV[a y b], then TZa [ a , h ] is a vector space , an algebra , 
and a lattice. 

Although an upper estimate on | f* f da | is hard to come by in general, an easy 
estimate is available when a is of bounded variation. 
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Theorem 14.16. Let a e BV[a,b] and let ft (x ) = Vfa. Then, for any f e 
7Z a [a,b], 


f b f da |< f b \f\dp< H/llooVj’a. 

Ja I Ja 

proof* First notice that if f e 7l at then f e Up and hence |/| e TZp (since p 
is increasing). So, at least both integrals in the inequality exist. Next, recall that 
|a(y) — a(*)l < PM — PM for any x < y. Thus, 

IW, P, T) | < J2 |/(f,)| |A« ( | < £ |/(/,)| A A = Spd/I, P , D. 

1=1 1=1 

It now follows that 

*/<*«! < [*\f\dfi < \\f\\oo[m-m] = \\f\\ooV a b a. □ 

I Ja 

Corollary 14.17. If a e BV[a,b], then f \-+ f^dct is a continuous , linear 
map onC[a, b\ Dually, iff e C[a, b ], thena f a f da is a continuous, linear 
map on BV[a,b ]. 7n (/, a) h* f£ f da is a continuous bilinear form on 
C[a,b] x BV[a,bl 

proof. The linearity, in either case, is obvious. To prove continuity, then, we 
only need to appeal to Theorem 8.20. That is, it suffices to note that each map is 
Lipschitz. But, | /* f da\ < ||/HooV> < ||/||ool|or || B v- □ 

Theorem 14.16 is an important result, so it couldn’t hurt to sketch a second proof of 
the inequality. Recall that if p and n are the positive and negative variations of a, then 
a = p — n + a(a) and p = p + n. Now see if you can fill in the details to the following 
short proof: 



I [ fd °\ = \[ fdp - f. fdn 

< f\f\dp+ f \f\dn = [ b \f\dp. 

Ja Ja Ja 

Since da = dp — dn while dp — dp -f dn , we might consider writing dp = \da\. 
With this suggestive notation, our integral inequality becomes 


pb pb 

I f da < I 
Ja Ja 


I/I \da\. 


If a f exists and is Riemann integrable, then Theorem 14.6 would further suggest that 
\da(t)\ should mean |cr'(/)| dt. Said another way, if a' exists and is Riemann integrable, 
then it seems reasonable to conjecture that p' also exists and equals |a'|. We will have 
more to say about this conjecture later in the chapter. 
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EXERCISES 

> 36. If a e BV[a,b] and / € TZ a [a, b], show that / e lZ a [c, d] for every 
subinterval [ c, d ] C [ a, b ]. 

37. Assume that /' is continuous. Use integration by parts to prove: 

(a) EL i /(*) = [«]/(«) - f" /'(*) [x] dx, 

(b) EL,(-DV(^) = /, 2 " /'«([*] - 2[x/2])«/x, 
where n is an integer and [jc] is the greatest integer in x. 

38. Let / € B V[ 0, 27r ] with /( 0) = /(27r). Show that both f(x) sin nx dx 
and /o* /(*)costtJt Jjc exist and each integral is at most (I/h)^ 2 */. (Conclusion: 
A periodic function of bounded variation has a Fourier series, and the terms of the 
series tend to 0.) 

39. Given a £ B V[a, b ], let p and n be the positive and negative variations of 
a. Show that IZ a = 1Z P fl 7 Z n and that J * f da = f* f dp — j ’J 7 f dn for any 

/ € n a . 

t> 40. If or £ BV[a, fo], show that fc] is a closed subspace of B[a, b]. Specifi- 
cally, if ( f n ) is a sequence in 7 Z a [a, b ] that converges uniformly to / on [a, b] y 
show that f e TZ a [a, b] and that f n da -> f* f da. 

41. Suppose that (aj is a sequence in B V[ a, b ] and that V£(a n — a) -> 0. Show 
that / da„ -»• f* / da for all / € C[ a, b ]. 

42. Suppose that (p is a strictly increasing continuous function from [c, d ] onto 
[a, fr].Givenc* 6 BV[a, b]and / € 7£ a [a, fc],showthat)ft = ao^) £ Z?V[c, d] 
and that g = / o <p £ Hp[ c, d ]. Moreover, /j* gdfi — f* f da. 

43. Given a sequence (jc„ ) of distinct points in {a , b ) and a sequence (c„ ) of real num- 
bers with Y7=\ \ c n I < °°» define a(jc) = YT=i c «/(* — *n)- Show that /j 7 f da = 

for every/ € C[a,fc].[Hint: Write «„(.*;) = £Li -**) and 
use Exercise 41.] 

44. Given a sequence (jc„) of distinct points in ( a , fo) and a sequence (c n ) of real 
numbers with Y^L] \ c n\ < define a by q?(jc) = c n if x = x n and a(jc) = 0 
otherwise. Show that a £ B V[ a, b ] and that /j 7 f da = 0 for every / € C[ a, b ]. 
Compare this result with Exercise 13. 

45. Given a e BV[a,b ], show that there is a function fl £ BV[a,b ] such that 
is right-continuous on (a, b) and /J 7 / da = /J 7 / J/S for all / £ C[ a, b ]. [Hint: 

Define p(a) = a(a ), >8(jc) = a(*+) for a < x < b> and fi(b) = Qr(ft). See 
Exercise 13.16.] 

46. Suppose that a is differentiable, and that a' is a bounded, Riemann integrable 
function on [a,b]. Show that / € lZ a [a,b ] if and only if fa' £ 7 Z[a,b]. 
In this case, f^fda = f{x)a\x)dx. [Hint: a is of bounded variation. 
Why?] 

47. Show that \f*adf\ < ||a|| B y||/ - /(b)lloo fora € BVta.fc] and / € 
C[a, &]. [Hint: df = d(/ — /(£)), where /(&) is a constant function.] 
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48. Suppose that (a„) is a sequence in BV[a,b] and that / € 1Z an for all n. If 
V£(a n — a) -> 0, show that / e1Z a and that f da n -> f da. [Hints: (i). 
First argue that / = lim*-^ /j 7 / exists, (ii). Next show that \S an (f , F, F) — 
S a (/, F, F)| -* 0 for any F, 7\ (iii). Finally, an e/3 argument shows that 
| S a (fy F, F) — 7| < e for some suitable F.] 

49. Let / € C[a, &]. Given e > 0, show that there exists a S > 0 such that 

| f^fda — S(/, F, F) \<eV£a for all partitions F = {*()>•••,■*«} with 
max i<*<n( JC i “ x i- 1) < any F, and any o' € BV[a, b]. [Hint: First show that 
f b a f da - S(f, P, T ) = {f{x) - f{t t ))da{x).] 


The Riemann Integral 

Let’s put aside our discussion of esoteric topics for a moment and turn our attention to 
two concrete problems raised at the beginning of this chapter. 

• Precisely which functions are Riemann integrable? If / is Riemann integrable, must 
/ have a point of continuity? 

• If of is increasing, does a' exist at all? Even at one point? 

Now these are big questions. And, although it will take us a while, we will give 
complete answers to both. For now, let’s see how we might take advantage of such in- 
formation in connection with Stieltjes integrals. In this section we will give (incomplete) 
answers to the following questions. 

• When does a Riemann-Stieltjes integral reduce to a Riemann integral? In particular, 
when is it true that /j 7 / da = f(x)a'(x)dxl (The first integral is a Stieltjes 
integral, while the second is a Riemann integral.) 

• When does the formula /j 7 f\x)dx = f(b) - f(a ) hold? 

The answer to both of these questions is contained in our next result. 

Theorem 14.18. Suppose that a' exists and is a {bounded) Riemann integrable 
function on [a, b]. Then , given a bounded function f on [a, b], we have f e 
1Z a [a y b] if and only if fa f eTl[a,b]. In either case , 

f fda= f f{x)a\x)dx. 

Ja J a 

proof. We want to compare S a (f F, F) and S x {fa r , F, F), where S x denotes a 
Riemann sum (i.e., a Riemann-Stieltjes sum with respect to the weight f(x) = x). 

Let e > 0. Since a' is Riemann integrable, there is a partition F* so that 
U x (a\ F) - L x (a\ P) < e for all F D F*. (Again, U x and L x denote Riemann 
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sums.) In particular, if T = (f| /„} and T* = {si s„} are any selections 

of points with f, , s, e [ jc,_i, x, ], then 

n 

|a'(s,) - a'(t,)| A.r, < e. (Why?) 

i=l 

Next, the mean value theorem allows us to write 

n 

W ; /». r> = £ - «(*<-« )1 

i = l 
n 

= ^/(f.)a'(Si) Ax, 

i=l 

for some s, € (x,_i , or,). 

Finally, 

w n 

| W, *\ D - $*(/«'. F, 7-)| = £ f(tj)a'(Si)Axj - £ /(/,V(f,) Ax, 

1=1 1=1 

n 

< ll/lloo X! !«'(*<) - «'('.)|Ax, < e ll/ll oo 

i = l 

for any T and any P d P*. Thus, if either integral exists, then so must the other - 
and they are necessarily equal. □ 

Theorem 14.18 gives us one-half of the Fundamental Theorem of Calculus. (Just 
take a and / in the formula above to be / and 1, respectively.) 

Corollary 14.19. If f is differentiable, and if f is a ( bounded) Riemann inte- 
grate function on[a,b ], then /* f\x)dx = f(b) — f{a). 

For the other half of the Fundamental Theorem, we want to show that the function 
F(x) = f* f is differentiable, and that F' = /. Again, it wouldn’t hurt to do this in 
some generality. 

Theorem 14.20. Let a be increasing, and let f e 1 Z a [a,b\. Define F(x) = 
f* f da for a < x < b. Then: 

(i) FeBV[a,b]; 

(ii) F is continuous at each point where a is continuous; 

(iii) F is differentiable at each point where a is differentiable and f is contin- 
uous. At any such point, F'(x) = f(x)a'(x). 

proof. First note that, for x < y, we have 

|F(y) - F(x)| = f /</<*!< II / II oc [<*(y) — <*(*)]. 
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And now (ii) plainly follows. The proof of (i) merely requires summing such 
differences; hence. 


V(F, P) < ll/ll oo V(a, P) = ||/||oc[aW-a(fl)]. 


Finally, to prove (iii), we need to fine tune the first inequality to an equal- 
ity. Specifically, the mean value theorem for integrals (Exercise 1 1) says that 
// fda = c[or(y) - <*(*)], for some c (depending on x and y) between inf[,, V ) / 
and sup ( _, v , /. If we divide by y - x, then 


F(y)-F(x) a(y)-a(x) 

= c ► f(x)a (x) 

y-x y-x 


as y -* X, 


provided that / is continuous at x and a'(x) exists. (Why?) □ 


Corollary 14.21. Let f € 72{a, b], and let F(x) = f* f(t)dt. Then, F € 
C[a, b]D BV[a, b], and F’{x) — f(x) at each point of continuity of f. 

Corollary 14.22. Suppose that a ' exists and is Riemann integrable on [a, b]. If 
/?(*) = V a 'a for a < x < b, then is differentiable at each point where a’ is 
continuous. At any such point, fi'(x) = |a'(.r)|. 

At the risk of being repetitious, let’s recall the two questions that we posed at the 
beginning of the section: If / is Riemann integrable, does / have any points of continuity 
at all? If a is increasing, does a' exist at all? Food for thought! 


EXERCISES 

50. If / is continuous on [ a, b ], and if |/(x)| dx = 0, show that / = 0. 

51. If / is continuous on [ a, b ], and if f* f(t)dt = 0 for all x in [ a, b ], show 
that / = 0. 


The Riesz Representation Theorem 


As pointed out in Corollary 14. 17, if a is of bounded variation on [ a, b ], then the map 
/ fgfda is a continuous, linear, real-valued function on C[a,b]. As it happens, 
every continuous, linear, real- valued function on C[a, b] is necessarily of this same 
form. In much the same way that a linear, real-valued map on R n is represented by inner 
product against some fixed vector, a linear, real-valued map on C[a,b] is represented 
by integration against some function in BV[a,b ]. In part, the Riesz representation 
theorem states that if L : C[a,b) -*■ R is continuous and linear, then there exists an 
a € BV[a,b] such that 



for all / € C[a,b ]. 
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Now there are many proofs of Riesz’s theorem. The proof that we will give is 
based largely on the observation made in Exercises 7 and 30 that finite sums of the 
form 


n 

£„(/) = X>/(* ( ) 

1 = 1 

can be represented as integration against a step function a n . This gives us a plan of 
attack: We will approximate the linear map L by a finite sum of the form L n , which 
is represented by a step function a n , and argue that (a„) converges to a function a of 
bounded variation that represents L. 

This particular approach has the advantage that it is in keeping with the spirit of 
Riemann integration. After all, if L(f ) is supposed to be an integral, then it ought to be 
a limit of integrals of step functions. The only catch here is that we look for a “global” 
approximation to L itself rather than a “local” approximation to a particular /. 

Before we can hope to give a proof of Riesz’s theorem, then, we will want to review 
a few facts about linear maps, and we will also need to have a few more convergence 
results at our disposal. 


Examples 14.23 

(a) If V is a normed vector space, then a linear map L : V R is continuous 
precisely when it is Lipschitz (Theorem 8.20), that is, if and only if there is a 
constant K such that \L(x)\ < AT ||jc || for every x e V. Said another way, L is 
continuous if and only if 


||L|| = sup 

jt/0 


\Hx)\ 

Ik II 


< 00 . 


The number ||L|| is called the norm of L\ it is the smallest constant K that works 
in the inequality above. In particular, |L(jc)| < ||L|| ||jt|| for every x e V . 

(b) Let’s clarify our claim about linear maps on R". Recall that every linear map 
L : R" -» R can be written as L(x) = ( x , y) for some y e R". Moreover, the 
representing vector y is unique. Indeed, if ( x , y\) = (jc, yi) for every jc, then 
(x, yi — y2> =0 for every x and it is easy to see that this forces y\ — y 2 = 0. 
What’s more, the Cauchy-Schwarz inequality tells us that L is continuous, 
|L(jc)| = |(ac, y) | < || y || 2 Ik II 2- That is, the constant K = ||y||2 “works ” so we 
must have ||L|| < ||y||2. But, in fact, ||L|| = ||y||2 since we also have ||y ||| = 
(y,y) = L(y)<||L|||k|| 2 . 

(c) If a e BV[a y b ], then the map defined by L(/) = f da for / e C[a, b] is 
continuous since |L(/)| < V%a \\f\\oo for every / e C[a,b]. That is, ||L|| < 
V%a. It is possible to show that we actually have ||L|| = V%a (see Exercise 52). 
Note, however, that the map L has more than one representative in BV[a, b]. 
For any constant c we have /J 7 f da = /J 7 f d(a + c), and V%(a -F c) = Vj’a, 
too. To instill a measure of uniqueness in Riesz’s theorem, then, we will want 
to “nail down” our representative by insisting that a(a) — 0, for instance. This 
alone will not quite do the trick, but it helps. (See Exercises 52 and 53.) 
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EXERCISES 

52. Given a € BV[a,b], define f)(a) = ala), fi(x) = of(x-l-), for a < x < b 
and f)(b) = alb). Show that is right-continuous on (a, b), that fl € BV[a, b ], 
and that f b f da = f b f dfi for every / € C[ a, b ]. 

53. Given a € BV[a, b], show that there is a unique fi € BV[a, b] with /3(a) = 
0 such that fi is right-continuous on (a, b) and f* f da ~ /* / dfi for every / € 
C[a,b). 

54. Suppose that a is right-continuous and increasing. Given e > 0 and [ c, d ] C 
[ a,b\ , construct a continuous function / with 0 < / < 1 such that /*/ da > 
aid) — a(c) — e. [Hint: / should “look like” Xic.rf i-l 

55. Let a € BV[a, b ] be right-continuous. Given e > 0 and a partition P of 
[a, b ], construct / € C[a, b ] with ||/||oo 5 1 such that f b f da > V(a, P) — e. 
Conclude that V b a = sup { f* f da : H/Hoo < 1 }. 


Next we focus our attention on convergence. The particular result that we need is a 
companion to Helly’s first theorem (Theorem 13.16). 

Helly’s Second Theorem 14.24. Suppose that (a„) is a sequence in BV[a, b]. 

If a n ->• a pointwise on[a,b ], and if Vf(a n ) < K for all n, then a € BV[a,b] 
and fa f da » fa f da for all f eC[a,b]. 

proof. The fact that a e B V[a, b ] follows from the observation that V(a, P) = 
Urn,,-.** V(a n , P) for any partition P. Thus, V b a < K,too. Hence, if / e C[a,b\, 
then / € 7 Z a [a,b]. 

Now let / e C[a,b] and e > 0. Since / is uniformly continuous, we can find 
a 5 > 0 such that \f(x) - /(y)| < e/fiK) whenever |x - y| < 8. Thus, if we fix a 


partition P = {jcq x„] with maxi<j<„(x ( - x;_i) < <5, then 

|j rb f da- S a (fP,T) 

= E[ r fte-mf d ° 
1 1 = 1 *'•*1-1 


= f {/(■*)- /('/)} 

| i=i d *-i | 


- E |/ - f^y da 


■Wi 

1 

VI 


= ( — ) V b a < 
V3 K) a 3 


What’s more, this same calculation applies equally well to any a n , and hence we 
have | f a b fda„ - S a ff, P , T)| < e/3, too. 
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Next, notice that 

I S a (f, P , T) - S a Jf, P, T)\ = \S a . a ,(f, P , T)| < [|/||ooV'(cr -a„, P). 

Since P is fixed, we may choose n large enough so that ||/||ooV'(Qf-or„, P) < e/3. 

Th us » | fa f da ~ fa f da » I < £ - D 


EXERCISE 

56. If / € C[0, 1 ], show that (l/n)5Z* =2 /(logfc/logn) -*■ /( 1 ) as n -*■ oo. 
[Hint: Consider a„ (x) = [n*] /n.} 


Helly’s second theorem allows us to further simplify our study of integration against 
functions of bounded variation. Recall from Chapter Thirteen that each function a of 
bounded variation may be written a = a c +<*,, where a c is continuous and a, is a saltus 
or “pure jump” function. Now it is easy to see that a saltus function a, is the pointwise 
limit of a sequence of step functions, say (P„), with V*/5„ < Vfa,. (See, for example. 
Exercise 14.) Thus, for any / e C[ a, b ], 

I" f da = f" f da c + lim f f dfa. 

Ja Ja n-*oc J a 

Integration against step functions may be directly computed; the limit in the second 
term would yield an infinite series (see Exercise 43). Thus, we would only have to 
concern ourselves with integration against a continuous function of bounded variation. 
As we will see, this case has much in common with the Riemann integral. 

For convenience, let’s consolidate Helly’s first and second theorems. 


Corollary 14.25. Suppose that (P „ ) is a bounded sequence in BV[a,b]; that is, 
suppose that \\P„\\bv < K foralln. Then, some subsequence (a„ ) of ) converges 

pointwise to a function a on [a,b] with ||a||flv < K. Moreover, f* f da„ — ► 
f* f da for every f € C[a,b\. 

With all of this machinery at our disposal, we can make short work of the proof of 
Riesz’s theorem. 


The Riesz Representation Theorem 14.26. Given a continuous, linear map 
L : C[a, b ] -► R, there exists ana 6 BV[a,b] with Vfa = ||L|| such that 

L(f)= f f da for all f €C[a,b]. 

Ja 

Moreover, we may take a to be right-continuous on ( a , b) with a(a) = 0. In this 
case , a is unique. 


proof. We will prove only the existence of or; the uniqueness claim is left as an 
exercise (see Exercise 53). 



238 


The Riemann-Stieltjes Integral 


First note that by Lemma 11.1 and Exercise 42 it is enough to prove the theorem 
for [a, b] = [0, 1 ]. Indeed, if <p(t) = a+t(b-a),0 <t< l,thenL(g) = L(go<p), 
where g e C[0, 1 ], defines a continuous linear map. If we can find some P € 
BV[ 0, 1 ] such that L(g) = / 0 ' g dp, then, since <p is strictly increasing, it follows 
that a = p o <p is in BV[a,b ] and that L(g o <p) = f*g o <pd(P o <p) for all 
g 6 C[ 0, 1 ]. That is, L(f) = /* / da for all / 'e C[ a, b ]. 

Our motive for translating the problem to [ 0, 1 ] is essentially cosmetic: We 
can now take advantage of the Bernstein polynomials (without introducing any 
additional translations). Recall that if we write 

Pn.k(x)= ^x*(l -x) n ~ k , forO < Ic < n, 

then B„(f) = J2k=of(n)P n i=t / on [0, 1 ] for any / e C[0, 1 ]. Thus, since L 
is continuous and linear, we have 

L (R n (/)) = £'/(-) L(Pn.k ) - Lif) 

M V«/ 

for any / € C[ 0, 1 ]. And here’s the key: The numbers L(p n k ) do not depend 
on /! 

We next construct a sequence of step functions (a„) such that 
for all / € Cl 0, 1 ]. This is easy; just set 


<U0) = 0, 

<*„(*) = L(p n .i), for 0 < x < -, 

n 

k k + 1 

a„(x) = L(p„, k ), for - < x < , k = 1 n - 1, 

n n 

a„(l) = L(p n ,„). 

Then, a n is a step function with a jump of size L(p n-k ) at k/n, k = 0, 1 n. 

Thus, / 0 ' f da„ = L(B„(f)) -»• £,(/) for all / € C(0, 1 ]. Note that a„ is right 
continuous on (0, 1) and a„(0) = 0. 

All that remains, in light of Helly’s theorems, is to show that Vgor, is bounded 
independent of n. To this end, recall that the binomial sequence (p„, k ) satisfies 
£*= o Pn.k = = 1 on 1 0, 1 ]. Thus, 
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< HZ. || £ ±p„. k 

IU=o 

<iwi =\\m 

ll*”° loo 

Here’s where we stand: By Helly’s theorems, we may suppose that (a„) converges 
pointwise to a function a on [ 0, 1 ] with 

f fda= lim f f da„ = L(f) 

Jo n -°° Jo 

for all / € C[0, 1 ] and with V^a < ||L||. Finally, since L is integration against 
a, it follows that we actually have ||L|| = V 0 'or. □ 


Other Definitions, Other Properties 


In this section we briefly discuss a variation on our definition of the Riemann-Stieltjes 
integral. The emphasis here is on brevity, not on exhaustive generalization. For this 
reason, many of the details have been relegated to the exercises. 

Throughout this section, / and a will denote arbitrary, bounded, real-valued func- 
tions on [a, b]. 

We next compare our definition of the integral, which we will call the “refinement 
integral,” to one given in terms of the mesh or norm of a partition P, defined by 
|| P|| = maxi<i<„ |jc, - jc,_i |. The norm integral is defined to be 


(AO 





provided that this limit exists. That is, the norm integral (AO f* f da exists if and 
only if there is a number I with the property that, for every e > 0, there exists a 
8 > 0 so that | S(/, P,T) - I \ < e for any partition P with ||P|| < 8 and any choice 
of T. Again, if such a number I exists, then it is unique, and in this case we set 

(AO f da = I. 

We will not require any notation for the space of norm-integrable functions; we will 
use a, b ] exclusively for the space of refinement-integrable functions. 

It is easy to see that the existence of the norm integral (AO / a fc / da implies the ex- 
istence of the refinement integral (RS) /* / da. In fact, if you will recall the proof of 
Theorem 14.5, we showed that continuous functions were refinement-integrable by 
proving the existence of the norm integral. The converse is not typically true, how- 
ever. Certain differences between the two integrals are described in the exercises. 
For our purposes, either integral will get us where we need to go, but more on this 
later. 
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EXERCISES 

57 . If ( N ) f^fdct exists, prove that the refinement integral (RS) f* f da also 
exists, and the two are equal. 

58 . If ( N ) f* f da exists, show that it equals lim*.^ S(/, P„, 7„), where ( P n ) is 
any sequence of partitions with || P n || — ► 0, and where (7^,) is arbitrary. In particular, 
the sequence of “regular” partitions, consisting of n equally spaced points, will do 
nicely. 

59 . If / is continuous and a is increasing, show that ( N ) f* f da exists. [Hint: 
Recall the proof of Theorem 14.5.] 

60. If a is increasing, and if / = (N) f da exists, show that 


lim 

IlH-o 


Hf, P ) = 


lim U(f,P) = 
m-*o 


/. 


61 . In the notation of Exercise 6, show that 

(a) (RS) fi da exists. 

(b) Given S > 0, there are partitions Q and /?, each having norm less than 5, such 
that L a (fi, Q) = 1 and L a (0, R) = 0. In other words, (N ) ff da does not 
exist. 

(c) (N ) f Q l fi da and (N ) fi da both exist (and both are 0). 

62 . If / and a share a common-sided discontinuity, show that the refinement integral 
(RS) f a f da does not exist. 

63 . If / and a share a common point of discontinuity (of any kind), show that the 
norm integral (N) f* f da does not exist. 

64 . Assuming that (RS) f* f df exists, compute it! Under what conditions on / 
will this integral exist? 

65 . Show that (N) j * f da exists if and only if, for every e > 0, there exists a 
S > 0 such that |S(/, P\ y T\) — S(/, P 2 , 72)1 < £ for any pair of partitions Pj, Pj 
of norm less than S and any T\ , T 2 . 

66. If / is continuous and a is of bounded variation, show that ( N)J f da exists 
and equals (RS) f* f da. 


Since our primary applications for the Riemann-Stieltjes integral require only con- 
tinuous integrands / and bounded variation integrators a y the canonical (Jordan) de- 
composition of a into the difference of increasing functions (each having the same 
points of continuity as a itself) saves the day. By Exercise 66, the two definitions of 
fa f W *U agree in this case. We are free to use whichever definition suits our fancy 
without fear of ambiguity. 

Exercises 62 and 63 highlight the difference between the refinement integral and 
the norm integral. The refinement integral admits a slightly larger class of integrable 
functions, in general. If, for example, a is both continuous and increasing, then both 
definitions coincide; that is, either both integrals exist (and are equal) or neither 
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exists. In particular, both approaches are equally valid for defining the Riemann in- 
tegral. 

Theorem 14.27. Suppose that a is continuous and increasing, and that f is 
bounded. Then, 


lim U (f, P) = inf U(f.P) and lim L(f, P) = inf L(f, P). 
n/>n— o p tPii-o p 


In particular, if ( RS ) / da exists, then so does (N) f* f da, and the two inte- 

grals are equal. 


proof. Set V = infp U(f, P). We will show that lim||/>n_o U(f, P) = U. That is, 
given e > 0, we will show that there is a 5 >0 such that U < U(f, P) < U + e 
for any partition P with || /* || < 8. 

To begin, let e > 0, and choose P* = {x^, xj) such that (/(/, P*) < 

U + e/2. Now, since a is uniformly continuous, there is a 5* >0 such that 
|cr(jc) - a(y)| < e/[Mk + l)||/||oo] whenever I* - y\ < 8*. Finally, choose 0 < 
8 < 8* so that 8 < mini<,<*(x* — x*_,). The claim is that this 8 works. 

Let P = {* 0 . jc„} be any partition with ||P|| < 5. Since we already have 

that U(f, P U P*) < U(f, P*) < U + e/2, it is enough to show that U(f, P) < 
U(f, P U P*) + e/2, or that U(f, P) - U(f, P U P*) < e/2. 

Suppose that we list the elements of P U P* in order, say. 


*0 = Xq < X\ < X2 < *3 < *2 c Jt4 •< *5 < Xj ■< . . . <C Jf„_| < X„ = X% . 


Now, since max i <,<„(.*, - x,_i) < 5 < mini< ; <*(x* - x*_,), it follows that a 
typical interval [ jc,_i , x, ] can contain at most one x*. There are at most k + 1 such 
intervals. We need not worry about those intervals [ jc* _ i , x t ] that do not contain an 
x*, because then P and P U P* will share [ x,_ i , x, ] as a “basic” subinterval, and so 
the common term in both U(f, P) and U (/, PUP*) cancels upon subtraction. So, 
let’s estimate a typical term in U{f, P) - U(f, P U P*) that is associated with an 
interval containing some x* , say, xj € [ x, _ i , x, ] . Let’s write A/, for the supremum 
of / over[x,_i,x, ], as usual, M" for the supremum of / over [Xj_i, x* J, and M** 
for the supremum of / over [x*. x, ]. Then, 


Mda(Xj) - a(x,_i)] - Af*[a(x*) - ar(x,_,)] - M”[a(x,) - a(x*)l 
= (Mi - M*)[a(Xj) - a(x,_i)] + (A/, - Af*)[a(x,) - a(x*)] 
<2||/|oo[a(x j )-o(x i _ l )l 

< 211/1,00 4(* + dii/iioo = WTTy 
Since there at most k -1- 1 such terms, U(f, P) — U(f, P U P*) < e/2. □ 


This is only the tip of the integral iceberg. There are several other variations on the 
Riemann-Stieitjes integral; the refinement integral and the norm integral are simply 
the two most common definitions. What’s more, there is still room to move in other 
directions, too. For example, we might also consider unbounded intervals or unbounded 
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integrands (i.e., “improper” integrals). The interested reader can find a wealth of in- 
formation on such generalizations in the references given in the Notes and Remarks 
section at the end of this chapter. 


Notes and Remarks 

For more on the history of the development of the integral, see the books by Hawkins 
[1970], Hobson [1927], Kline [1972], and Lebesgue [1928], and the articles by 
Hildebrandt [1917, 1938]. 

An easy to read and informative synopsis of Stieltjes’s own point of view is supplied 
by the selection “Stieltjes on the Stieltjes integral,” in Birkhoff [1973]. In this short 
passage, translated from Stieltjes [1894], we find Stieltjes’s description of the problem 
of moments, his proofs that increasing functions have left- and right-hand limits, and 
his definition of the integral that bears his name. Lebesgue had a great deal to say about 
the Stieltjes integral, too. He devoted 61 pages of his Leqons to the topic, including a 
discussion of Riesz’s theorem (Theorem 14.26) and a tribute to the genius of Cauchy, 
who, according to Lebesgue, had already considered the notion of integration against 
weight functions. Lebesgue’s insights on Cauchy’s work and its relationship to the 
physical world are reason enough to read this particular passage (see Lebesgue [1928, 
Chap. XI]). 

The notion of using upper and lower Riemann sums was independently introduced 
by several mathematicians in 1875, or thereabouts. These early approaches combined 
the features of the so-called “refinement” integral and the “norm” integral; rather than 
considering the supremum of lower sums, for example, one took the limit of L(/, P ) as 
l|F|| -> 0. The approach that we have taken is somewhat more modem and, according 
to Hildebrandt [1938], is due to Moore and Smith [1922] and Kolmogorov [1930]. For 
those who long for the “area under the graph” approach, see Bullock [1988]. 

Frigyes (Frederic, Friedrich) Riesz first proved his representation theorem (Theorem 
14.26) in Riesz [1909b]. It is fair to say that Riesz’s result brought the Stieltjes integral 
to the attention of the general mathematical public. He was clearly fond of this particular 
result, as he later published three more proofs, along with several other related results. 
Important among these is Riesz [1911], in which he adds further detail to his initial 
result. Eduard Helly also gave a proof in Helly [1912]. Here you will find Helly’s 
first and second theorems (Theorems 13.16 and 14.24) together with several clever 
observations used to prove Riesz’s theorem. It is interesting to note here that Helly 
refers to Riesz in regard to the “principle of choice” (Helly’s selection principle), and 
Riesz, in turn, refers to Frechet’s thesis, Frechet [1906]. The proofs given here of Helly’s 
second theorem (Theorem 14.23) and of the Riesz representation theorem (Theorem 
14.25) are based largely on the presentation in Natanson [1955]. 

Both Helly and Riesz were interested in what has been variously called the Hausdorff 
or Stieltjes moment problem. In terms of Stieltjes integrals, the problem is to determine 
an increasing function a, all of whose moments have been specified in advance. That 
is, given a sequence of positive numbers (c*), find an increasing function a for which 
/J 7 x k da(x ) = c k , where k = 0, 1, The moment problem was of pivotal importance 
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in the development of functional analysis and function spaces in general. If we interpret 
each of the integrals as a finite sum, then we are led to consider a system of infinitely 
many linear equations in infinitely many unknowns. This approach led to the study 
of abstract, infinite-dimensional vector spaces. If, on the other hand, we think of the 
integral as a linear operation on C\ a, b ], then the problem asks whether a linear map 
whose value on each polynomial has been specified in advance may be represented 
as Stieltjes integration against some increasing function. This point of view led to 
the study of linear functions, or operators , between abstract vector spaces. For more 
information on the work of Helly and Riesz, especially with regard to its influence on 
the development of abstract spaces and functional analysis, see Bemkopf [1966, 1967], 
Monna [1973], and Dieudonne [1981]. For more details on the moment problem itself, 
see Shohat and Tamarkin [1943]. 

The Stieltjes integral is of value to probabilists and statisticians (you may have al- 
ready surmised this from the similarity of nomenclature - a probability density function 
really is a density!). But do not take my word for it; just check out Volume 1 of the 
Annals of Mathematical Statistics. You will find two papers therein concerning the 
Stieltjes integral: Baten [1930] and Shohat [1930]. 

Work on the Stieltjes integral continues in modem times, too; witness Kenneth 
Ross [1980a]. Ross’s approach seeks a middle ground between the norm integral and 
the refinement integral. A more complete discussion is available in his book, Ross 
[1980b]. 

Exercise 6 is taken, in part, from Rudin [1953]. Much of the flavor of Chapter 
Fourteen is borrowed from the tasty presentation in Apostol [1975]; Exercises 31, 37, 
38, and 56 are based on Apostol exercises. Theorem 14.26 is taken from Wheeden 
and Zygmund [1977], a source of still more information about Stieltjes integrals. Also 
see Natanson [1955, Vol. I], Johnsonbaugh and Pfaffenberger [1981], and Lojasiewicz 
[1988]. Exercise 47 is taken from lecture notes on a course in real analysis given by 
W. B. Johnson at The Ohio State University in 1974-75. 



CHAPTER FIFTEEN 


Fourier Series 


Preliminaries 


In Chapter Ten we defined the Fourier series associated to a 2n -periodic function /, 
which is (bounded and Riemann) integrable on [-jr, n ], by 


^ (a k cos kx + b k sin kx), 

^ *=1 

where the Fourier coefficients a k and b k are given by 

l 1 r* 

a k = — / fit) cos ktdt and b k = — / f(t) sin ktdt. 

7T J —x Tt J 

Note that each of these integrals is defined and finite; in fact, a k and b k satisfy 
|a*l < - f’ 1/(01 dt and \b k \ < - f* |/(r)| dt. 

71 J—n 71 J—jt 

Thus, since / is bounded, we even have \a k \ < 2||/'|| co and \b k \ < 211/Hoo. We denote 
the partial sums of this series by 


Sn(f)(x) = y + ^2 ( ak coskx + h sin kx). 

2 l 

Please note that s n (f ) is a trig polynomial of degree at most n\ in symbols, s n (f ) e T n . 

While we will be interested in whether s n (f) converges to /, we will soon see that 
the Fourier series for / provides a useful representation for / even if the series should 
fail to converge pointwise to /. We mirror this in our notation by writing 


/(*) ~ ~ 4- ^ (a k cos kx + b k sin kx). 


k=\ 


Recall from our previous discussions that the key to the Fourier series representation 
is the fact that the functions 1 , cosjc , sinx, cos 2 jc , sin2x, . . . , are orthogonal on any 
interval of length 2 n. Specifically, taking [ — tt , tx ] as our interval of choice, it is not 
hard to check that 



cos mx cos nx dx = 



sinm* smnxdx 
cos mx sin nxdx = 0 
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for any m±n (where the last equation holds even for m = n ), 

/ cos 2 mx dx = / sin 2 mx dx = n 
■n J -it 

for any m ^ 0, and, of course, f* n 1 dx = 2n. (The fact that this last integral equals 
2 jr, rather than n, explains why we write the first Fourier coefficient as ao/2.) 


EXERCISES 

o 1. Let / : R — > R be 2jt -periodic and Riemann integrable on [—nr, nr ]. If / is 
even (resp., odd), show that its Fourier series can be written using only cosine (resp., 
sine) terms. 

2. Define f(x) = Jt — x for 0 < x < 2n, /( 0) = f(2n) = 0, and extend / to a 
2n -periodic function on R (in the obvious way). Show that the Fourier series for / 
is 253 ^ 1 1 sinnjc/n. 

3. Let / 6 BV[- jt, jt ] with f(-Jt) = /(nr). Show that both (1/n-) f(x) 
sinnxdx and(l/7r) j” n /(x) cos nx dx exist, and that each is at most ( 1 /n)V*„ f. 


The study of the pointwise convergence of Fourier series has a long and checkered 
history - to paraphrase Halmos, its history includes “almost 200 years of barking up 
the wrong tree.” In all of its glory, pointwise convergence is a delicate and complex 
issue, arguably too complex to warrant thorough pursuit here. For this reason, we will 
be primarily concerned with the wealth of useful information that is already at hand. 
This “easy” approach will nevertheless provide some deep results of its own. Just 
watch! 


Observations 15.1, 

(a) If T(x) = (ao/2)+£2 = i ( a * coskx+p k sin/tx) is a trig polynomial of degree 
n and if m = 1 , . . . , n, then 



T(x) cos mx dx = a m 


£ 


cos 2 mxdx = 7ra m , 


while if m = 0, then 


/>>*-?£ 


1 dx = jrao- 


Similarly, for m = 1, 2 n, 



T(x) sin mxdx = p m f sin 2 mxdx = nfi m . 


If m > n, then each of these integrals is 0. Thus, if T e T n , then T is actually 
equal to its own Fourier series. Said another way: Given T e T„, we have 
s m (T) = T whenever m > n. 



246 


Fourier Series 


(b) If / (and hence also f 2 ) is Riemann integrable on [— n, tt ], then s n (f) 
minimizes the integral 

r [m-nx)] 2 dx 

J - 71 

over all choices of trig polynomials T of degree at most n. To see this, let 
T(x) = (ao/2) + Ylk=\ ( a * cos kx + Pk sinfcjt) and first note that 

/ 7T pTl p7T fX 

[f-T] 2 = I f 2 — 2 I f T + / T 2 . 

■71 J -71 J -71 J -7T 

By using the linearity of the integral and the orthogonality of the trig system, 
we can write each of the last two integrals in terms of the Fourier coefficients 
of / and T. Indeed, from (a), 

f f(x)T(x)dx = ^ f f(x)dx + Y a k f f(x)coskxdx 

J-7T J —71 J —71 


n fiTT 

!>/ 


f(x) sin kxdx 


= 71 


= 7 r + Y (a k a k + p k b k ) 


and (after replacing / by T in the previous calculation) 


+ X >* 2 + tf ) • 

J-n z *=1 


Now, since or \ - 2a k a k = (a* — a*) 2 - aj, we get 
- [ [/(-*:) - T(x)f dx = ^ f f(x ) 2 dx-^-Y (a* + b\) 

ft J -77 ft J -71 ^ k—\ 

+ - ° . —— + ^ ((a* — a*) 2 + (fit — b k ) 2 ). 

1 *=i 

The right-hand side is minimized precisely when or* = a* and p k — b k for 
all k, in other words, precisely when T = s n (f). Please note that in this case 
we have 

- f [f~ * n (/)] 2 = ~ f fix ) 2 dx-^~Y ( a l + h l) 

ft J -7t ft J ~7l ^ jfc— | 


= - [ fix) 2 dx - - f s n (f)(x) 2 dx. 

ft J -7T ft J —77 


(c) The calculation in (b) leads us to consider the Z, 2 -norm, defined by 


•(if. 


fix) 2 dx 


(15.1) 
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where we assume here that / is Riemann integrable. The proof that this 
expression defines a (semi-)norm is essentially identical to the proof that 
we gave in Chapter Three for the (b-norm (Lemma 3.3 and Theorem 3.4); 
we will save the details for a later section (where we will prove an even 
more general result). Please note that if / e C 2n , then ||/||2 < \/2 ||/||oo- 
Of greatest importance to us is the fact that we have a “continuous” 
analogue of the familiar “dot product” (or inner product; see the discussion 
preceding Lemma 3.3). In particular, if / and g are Riemann integrable, 
then the map 

(/.«)>-»• (/.«> = - f f(x)g(x)dx 

TT J _jf 

satisfies all of the familiar properties of the dot product in R". Specifically, 
the map is linear in each of its arguments, satisfies the Cauchy-Schwarz 
inequality (see Theorem 14.7 (v)): 

\ \ C” l/i r” \ I/2 /l f n \ l/2 

-/ f(x)g(x)dx\ < l- f(x) 2 dx ) (-/ g(x) 2 dx] , 

\7T J-„ | \7I J.„ ) \jt J-„ ) 

and is related to the 12-norm by ||/|| 2 = s/( f, f)- 

We can now clarify the claim made in (a): The functions 1, cosx, 
cos 2x , . ... sin x, sin 2 jc, . . . , are orthogonal in the sense that any two dis- 
tinct functions from the list have zero “dot product.” Moreover, the functions 
1 /s/l, cos at, coslr, . . . , sin a:, sin2x, . . . , are actually orthonormal; that is, 
they are mutually orthogonal and each has L 2 -norm one (thanks to the extra 
factor l/n in equation (IS. 1)). 

(d) Observation (b) can now be rephrased: The partial sum s n (f) is the nearest 
point to / out of T„ relative to the L 2 - norm. In other words, 

•nf II/- Tib = ||/-s„(/)||2. 

TeT, 

Moreover, 

11/ - Sn(/)lli = -j: f f{x) 2 dx - y -£(«*+ b\) 

* J-7i 2 (15.2) 

= ii/iii - im/)i& 

Since \\f - j„(/)||| > 0, we have 

IM/)II| = - f s„(f)(x) 2 dx 

7t J -jj- 

+ew+« 

A ifc= I 

< - r nxfdx = ii/H 2 . 

7T J- n 

Inotherwords, ||s„(/)|b < ||/|| 2 .This result is known as Bessel’s inequality. 
Since n is arbitrary, it follows that the Fourier coefficients of a Riemann 
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square-integrable function / are square-summable and satisfy 

f + e (°* + b D ^ ^ r /(;t)2 ^ (,5 - 3> 

In particular, the Fourier coefficients of / must tend to zero: 

/ 7T /»7T 

/(jc) cosnxdx — 0 = lim / fix) sinnxdx. (15.4) 

This fact is known as Riemann ’s lemma and will prove very useful in sub- 
sequent observations. 

(e) For / e C 2 * we have \\f - s„(/)||2 -*• 0; that is, / is the limit of its Fourier 
series in the Z,2-norm. Indeed, given e > 0, Weierstrass’s second theorem 
(Theorem 1 1 .8) supplies a trig polynomial T*, of some finite degree m, with 
||/ — 7‘Hoo < £. Thus, for all n > m, 

11/ -s„(/)ll2 = *nf 11/ - Tib < V2 inf \\f - TU < eV2, 

TeT m reT. 

since T* eT m cT n . 

(f) If / and g are Riemann integrable on [-7T, n ],then s„(f +g) = s„(f)+s„(g) 
for every n. In fact, each Fourier coefficient of the sum / + g is the sum of 
the corresponding Fourier coefficients for / and g-, for example, 

/ 7t /*7T /*JT 

[/(jc) + g(*)] cos kxdx = / f(x) cos kxdx + I g(x) cos kxdx. 

■7t J-7T J —71 

Essentially the same reasoning shows that s n (af + Pg) = a s„(f) + p s n (g) 
for any pair of real numbers a and p. In other words, the map / »-* s„(f) is 
linear. 

(g) This linearity of s„ allows us to extend the result in observation (e): If / is 
Riemann integrable on [-7r, n J, then || / - s„(/)||2 -*■ 0. It is in this sense 
that we justify the claim that / is represented by its Fourier series. To see 
this, let e > 0 and choose a function g € C 2 ” satisfying 

( 1 rn . \ */2 

- J [/(•*) - gU)] dxj < e. 

(How? See Exercise 5.) Next, since s„ is linear, we have 

11/ - M/)ll2 < 11/ - gib + II g - *„(g)lb + lk»(/ - g)ll2- 

But, from Bessel’s inequality, ||s„(/ - g)lh < 11/ - gib < e and so 

\\f-s„if)h < 2* + ||g - Sn(g)lb <3e 

for all n sufficiently large, from observation (e). 

(h) Combining results (d) — (g) we arrive at Parseval ’s equation: If / is Riemann 

integrable on [-n, n ], then ||/||| = lim n _ >00 ||s n (/)|||; that is, 

n J-* 2 


(15.5) 
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In other words, in light of equation (1S.2), Parseval’s equation is equivalent 
to the statement that ||/ - $„(/)|b -*■ 0. 

(i) It is immediate from Parseval’s equation that distinct elements from C 2 ” 
have different Fourier series. In other words, if /, g € C 2 " satisfy 

[/(■*) — g(x)] cos nxdx =0 and f* n [ f(x) — g(jc)] sin nxdx = 0 for all 
n = 0, 1, 2, . . . , then we would also have f” n [/U) - g(jt)] 2 dx = 0. But, 
since / and g are continuous, this easily implies that f - g = 0. (How?) 
Compare this approach to that used in Exercise 11.31. 

(j) Here is an easy consequence of our discussion of uniform convergence 
in Chapter Ten: If the Fourier series of a function / e C 2n is uniformly 
convergent, then the series must actually converge to /. Of course, if a 
trigonometric series is uniformly convergent, then its sum defines a con- 
tinuous function; let’s call it g € C 2 * in this case. All that remains is to 
notice that g has the same Fourier coefficients as /, and this is easy: If 
s„(f) converges uniformly to g, then s„(f)(x) cos kx converges uniformly 
to g(jr) cos kx, for example, and so (interchanging limit and integral and 
using (a)) we have 

1 r" 1 r 

— / g(x) coskxdx = lim — / s n (f)(x) cos kx dx = at- 

n J.„ n^co n J-„ 

Similarly, (l/7r) J* n g(x) sin kx dx = b k . According to our last observation, 
this means that / = g. 

(k) If the Fourier coefficients for / satisfy \a„\ < oo and \b„ \ < oo, then 

(as an easy consequence of the Af-test, Lemma 10.9) the Fourier series for 
/ is uniformly convergent on R. Thus, if we are also given that / € C 2 *, it 
follows from (j) that the Fourier series for / converges uniformly to /. 

The introduction of the Li norm is designed to make clear the sense in which a 
continuous function / is “represented by” its Fourier series: While / need not be 
the pointwise limit of its Fourier series (indeed the series may even diverge at certain 
points), / is nevertheless the limit of its Fourier series in some metric - and limits 
in metric spaces are unique. (See Exercise 4 for more on this.) Consequently, each 
/ € C 2 * is uniquely determined by its Fourier series. 


EXERCISES 

> 4 . If / is Riemann integrable on [— n, n ] and H/lb = 0, does it follow that 
/ = 0? It is true if we assume, in addition, that / is continuous. Why? In other 
words, assuming the validity of the triangle inequality, verify that the L 2 -norm is 
truly a norm on C[—n, n ]. 

S. Let / be Riemann integrable on [— tt, n ], and let e > 0. 

(a) Show that there is a continuous function g on [— n, it ] satisfying II/ — gib < £■ 
[Hint: Mimic the proof of Theorem 14.9.] 

(b) Show that there is a continuous, 2 7 r -periodic function h € C 2n satisfying 

ll/-A|b<*. 
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(c) Show that there is a trig polynomial T with \\f — T\\i < £ . 

6. Let / : R — ► R be 2n -periodic and Riemann integrable on [— 7T, 7 r ]. Prove that 
lim^o I fix + t) - f(t)\ 2 dt = 0. 

7. Define /(j c) = (7r — jc ) 2 for 0 < x < In, and extend / to a 2n -periodic 

continuous function on R in the obvious way. Show that the Fourier series for / 
is n 2 / 3 + cosnx/n 2 . Since the series is uniformly convergent, it actually 

converges to /. In particular, note that setting x = 0 yields the familiar formula 

Er=.i/« 2 =* 2 /6. 


Dirichlet’s Formula 


To better understand the pointwise convergence of Fourier series, it would be helpful 
to have a closed expression for s n (f) (that is, an expression not involving a sum). For 
this we will need a couple of trig identities; the first two need no explanation: 

cos&t cos kx 4- sin kt sin Ax = cos k(t — x) 

2 cos a sin = sin(a 4- ft) — sin(a — ft) 

1 sin (n + |) 9 

- 4- cos 9 4- cos 29 4- • • • 4- cos n9 = ^ — . 

2 2 sin \9 

Here is a short proof for the third: 

n n 

sin ~6 + ^2cos*0 sin ^ 0 = sin \0 + ^ [sin (k + 0 - sin (A: — 0 ] 

*=! 

= sin ( n + 0. 


k=\ 


Now we are ready to rewrite our formula for s n (f ): 

n 

S„(f)(x) = y + (a* cosfcx 4- sin*x) 


*=1 


1 /** r j * 1 

= — / /(/) - + r ( cos kt cos kx -f sin kt sin kx) 

71 J -» L 2 J 

/_> 


dt 


= - r /(o 
* 


sin(n 4- 5) (f - x) 
2 sin 4 (/ — x) 


dt. 


The function 


1 * s sinfn 4" 0 / 

£> n (f) = -4 -Y]cos<:/ = - 

2 fcf 2 sin | r 


(15.6) 
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is called Dirichlet’s kernel; note that D„ € T„. In this notation, our formula for s„(f) 
reads 


s„if) (x) = - f fit) D„(t - x)dt. 

7z 

If / is 27r -periodic, then we may also write 

S„(f)(x) = - f fix + r) D„it)dt. 

71 J _ JT 

While we know that s„if) is a good approximation to / in the Z. 2 -norm, a better 
understanding of its effectiveness as a uniform approximation will require a better 
understanding of the Dirichlet kernel D„. Figure 15.1 displays the graph of D„ for 
n = 30, while the following are a few important observations about D„ and its inte- 
grals. 



(t)dt = 1, 


Lemma 15.2. 

(a) D„ is even, 

(b) (1/7T) [ D„it)dt = (2/ 7T ) f D n 

J-jr Jo 

(c) |D„(/)| < n + | and D„(0) = « + |, 

(d) (|sin(n + |)r \/t) < |D n (f)| < in/2t) for 0 < t <n, 

(e) If = (l/?r) f \D„it)\dt, then (4/n- 2 )logn < k„ < 3 + logn. 

J -71 

proof, (a), (b), and (c) are relatively clear from the fact that 


D„(t) = ^ + cosr + cos2r H + cos nt. 

(Notice, too, that (b) follows from the fact that j„( 1) = 1.) For (d) we use a more 
delicate estimate: Since 29 /n < sin 9 <9 for 0 <9 < njl, it follows that 
2 tin < 2sin(r/2) < t forO < t < n. Hence, 

7T > |sin (n + |) t_ | > |sin(« + l)f | 

2 1 ~ 2 sin ~ t 
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for 0 < t < n. Next, the upper estimate in (e) is easy: 

M"+jM 


2 f* 2 r 

- \D n (t)\dt = — I 
jr Jo it J o 


2 sin 


dt 


2 f i/n / 1\ , 2 f* n . 

n J 0 \ 2) it J l/n 2t 


2n + 1 


;rn 


l/B 

+ log w + logn < 3 + logn. 


The lower estimate takes more work: 


2 /** 2 r 

- \D n (t)\dt = — / 
**J0 Jo 

>ir 

ft Jo 

-if 

n Jo 


[sin(n + t 
2 sin ^r 

l sin (" + IH 


dt 


dt 


2 r(<i+(i/2))B | sin jc | 


dx 


> 2 f nn | sin jc 1 

~ ft Jo x 


dx 


x 

kx 


s 2 A r |sinx | 

>ly±r 

* k=i kn J »- 1 


| sinjc |dx 


4 V^ 1 4 , 


because X)t=i (V*) > log «• □ 

Hie numbers A.„ = (1/jr) /* T |Z)„(f)| dt are called the Lebesgue numbers and serve 
the following purpose: 

CoroUary 15.3. Iff e C 2 *, then 

M/X*)l < - r |/<* + 01 I A,(OI dt < XJ/lloo. (15.7) 
n 

In particular, Bj„(/)IIoo < A-J/floo < (3 + logn)B/fl<x>. 

If we approximate the function sgn D„ by a continuous function / of norm one, 
then 


S n (f)( 0 ) * - r \D n (t)\dt = X„. 
ft J~X 

Thus, X„ is the smallest constant that works in equation (15.7); see Exercise 8. The fact 
that s„(f) may have a very large sup-norm compared to / means, in particular, that 
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s n (f) is typically a poor approximation to / in the uniform norm. In sharp contrast, 
recall that in the L 2 -norm we have lk„(/)|h < ll/lb- Of course, s„(f) is a very good 
approximation to / in the Lj-norm. 

Now that we have Dirichlet’s formula at our disposal, however, it is not difficult to 
find conditions under which s„(f ) will converge uniformly to /. 

Theorem 15.4. Let f be a continuous function on [— n, n ] with f(—ir) = f(jt) 

and suppose that f has a bounded, piecewise continuous derivative on [—it, it \. 

Then, the Fourier series for f converges uniformly to f on [— it, it ]. 


proof. Since / ' is piecewise continuous, we may use integration by parts to 
compare its Fourier coefficients, called a' n and b' n here, with those of /, which we 
will call a„ and b„. Notice, for example, that 


_ _i_ r 

n 

-U. 


(jc) cos nxdx 


f(x)d(cosnx) + [f(7T)cosnn — /(— 7r)cos(— nn)] 


f(x) sin nxdx = nb n 


(for n > 1). Similarly, 



sin nx dx — 

n 



cos nxdx — —na„. 


Since the Fourier coefficients of /' are square-summable, we conclude that 


^ n 2 a \ < oo and < oo. 

n=l nx=l 

But now a simple application of the Cauchy-Schwarz inequality tells us that the 
Fourier coefficients of / must, in fact, be absolutely summable: 

OO OO , / OO \ / OO I \ 

= < 00 . 
n= I n=l \n= 1 / \n=l / 

Similarly, l^nl < oo. An application of the Weierstrass A/ -test now shows 
that the Fourier series for / is uniformly convergent and hence must actually 
converge to /. □ 


Note, for example, that Theorem 15.4 holds for polygonal functions, or even for 
“piecewise polynomial” functions in C 2 ", and these collections clearly form dense 
subsets of C 2 ". But while Theorem 15.4 supplies a large class of functions for which 
s n (f) converges to /, there are examples available of continuous functions whose 
Fourier series fail to converge (in fact, we can even arrange for divergence on a dense 
set of points). In other words, s„(f) is typically not a good pointwise approximation to 
/, let alone a good uniform approximation. To approximate a continuous function / 
uniformly by trig polynomials, then, we will need to look for something better than the 
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sequence s n (f ). Said another way: We will need to replace D n by a better kernel. And 
this is exactly what we will do. 


EXERCISES 

8. Fix rt > 1 and e > 0. 

(a) Show that there is a continuous function f e C 2n satisfying ||/||oo = 1 and 
(W 1/(0 - sgn D n (t) I dt < e/{n + 1). 

( b ) Show that $„(/)( 0) > — £ and, hence, that || 5 , „(/)|| 00 > — e. 


Fejer’s Theorem 

To motivate our next result, we begin with a simple fact about numerical sequences. 
We suppose that we are given a sequence of real numbers ($„) and we consider the 
sequence formed by their arithmetic means (or Ceskro sums) 

.Si + S2 H V S n 

cr n = . 

n 

The claim here is that the sequence (o n ) has better convergence properties than the 
original sequence (s„). 

Lemma 15.5. If s n — ► s, then cr n -> 5 . 

proof. If (5 n ) is convergent, then it is also bounded; let’s say that |.y n | < B for 
all n. Next, given e > 0, choose n such that 1^ — 5 1 ! < e for all k > n. Fixing this 
rt, now consider 


^1 + 5 2 + • ‘ * + S N S\+---+S n , S n +i + ...+S N 

a — — . 

N N N N 

Clearly, for N > «, 
and hence 


s — 2s < on < s 4* 2s 
for all N sufficiently large. □ 

The point to Lemma 15.5 is that averaging preserves convergence. In fact, it often 
enhances convergence: In the case of the nonconvergent sequence s n = (— 1)", it is not 
hard to check that the corresponding sequence ( cr n ) converges to 0. In short, averaging 
cannot hurt and occasionally helps when considering nonconvergent sequences. 



Fejer's Theorem 


255 


Now, since the sequence of partial sums ($„(/)) of a Fourier series need not converge 
to /, we might try looking at the sequence of arithmetic means (o„(f)) defined by 

<*«(/)(*) = -(•*>(/) H + S„-|(/))(*) 


-i r 

X J-7T 

-if 

n J-„ 


f(x + 1) 


i n— I 

- E D *(') 


*=0 


dr 




where Af n = (l/n)(D 0 + D| h h D„_i ) is called Fej4r’s kernel. The same techniques 

that we used earlier can be applied to find a closed form for er„(/), which, of course, 

reduces to simplifying (l/n)(D 0 + D\ -\ b D„_ i). As before, we begin with a trig 

identity: 

n — 1 n — 1 

2 sin 6 ^ sin (2k + 1 )6 = £ [ cos 2kQ - cos (2k + 2)6] 
k = 0 k=a 

= 1 — cos 2/i0 = 2sin 2 n0. 


Thus, 


II yi sin(2fc -f- l)r/2 _ sin 2 (/n/2) 

" n “ 2 sin (r/2) 2nsin 2 (r/2) 


(15.8) 


Please note that K„ is an even, nonnegative trig polynomial of degree at most n - 1 
and satisfies (1 /jt) f* x K„(t)dt = 1. (Why?) Figure 15.2 displays the graph of K„ for 
n = 20. 



Now o„(f) is still a good approximation to / in the Li-norm. Indeed, from Lemma 
15.5 we have 


H/-M/)ll2 = - 

n 


£(/-*(/» <-£i/-**(/)ii 2 -»>o 

k = 0 y n k = 0 


as n -► oo (since ||/ - ,s*(/)ll 2 0). But, more to the point, o n (f) is actually a good 

uniform approximation to /, a fact that we will call Fejir's theorem . 


Fejer’s Theorem 15.6. If f € C 2 *, then a n {f) converges uniformly to f as 


n -► oo. 



256 


Fourier Series 


Now Fej6r’s theorem is but a single typical example of a larger class of convergence 
theorems. This point can be made most clear by isolating the key ingredients in its 
proof as a self-contained statement about certain “kernel operators.” 

Theorem 15.7. Suppose that a sequence (k n ) in C 2 * satisfies 

(a) k„ > 0, 

(b) (1 /jt) f\(t)dt= land 

J —n 

(c) / k„(t)dt -*• 0, as n -*■ oo.for every S > 0. 

J6<\t\<7T 


Then , (1 /n) f f(x + t)k„(t)dt =t fix), as n 


oo, for each f € C 2 *. 


proof. Let e > 0. Since / is uniformly continuous, we may choose 5 > 0 so 
that |/0c) - fix + t ) | < e, for all x, whenever |r| <5. Next, we use the fact that 
k„ is nonnegative and integrates to 1 to write 


\m 


i r 

n J-„ 


fix + t)k n (t)dt 


= “ [/(*) “ /(•* + ')] k n {t)dt 

< - f |/(-r) - fix + t)| k„(t)dt 
71 J\t\<6 n Js< |i 


< e + e = 2e 




for all n sufficiently large (independent of jc). □ 


To see that Fejdr’s kernel satisfies the conditions of Theorem 15.7 is easy. In par- 
ticular, (c) follows from the fact that K n (t) :4 0 on the set S < |/| < n. Indeed, since 
sin(t/2) increases on S < t < n, we have 


0 < K n (t) = 


sin 2 (nt/2) ^ 1 

2 n sin 2 (r/2) 5 2 n sin 2 (5/2) 


0 


in —* oo). 


Since a„(/) is a trig polynomial, notice that Fej6r’s theorem implies Weierstrass’s 
second theorem. Here, then, is one of the independent proofs of Weierstrass’s second 
theorem that we referred to in Chapter Eleven. (Notice too that although we used the 
Weierstrass theorem to facilitate our discussion of the L 2 -theory, the proof of Fejdr’s 
theorem is self-contained.) As we pointed out in Chapter Eleven, the first Weierstrass 
theorem may then also be viewed as a consequence of Fej6r’s theorem. 


Corollary 15.8. (Weierstrass’s Second Theorem) Given f e C 2 " and e > 0, 
there is a trig polynomial T such that || / — 71 00 < £■ 


Corollary 15.9. (Weierstrass’s First Theorem) Given f e C[a,b] and e > 0, 
there is a polynomial p such that ||/ — plloo < £• 
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You might find it interesting to learn that Fejer was a fourth-year student, only 19 
years old, when he proved his result (about 1900) while Weierstrass was 75 at the time 
he proved his approximation theorems (about 1885). It is especially interesting when 
you consider that, only a few years earlier, Fejer’s teachers had decided he was a weak 
student and so should be charged extra tuition! 


EXERCISES 

9. Prove that \\a n (f)\\ 2 < ||/|| 2 and ||a w (/)|| 00 < ||/|| 

10 . 

(a) If /, k e C 2jr , prove that g(x) = f* n f(x 4- t)k(t)dt is in C 2n . 

(b) If we assume only that / is 2n -periodic and Riemann integrable (but still k € 
C 27r ), is g(x) = f(x + t)k(t)dt continuous? 

(c) If we simply assume that / and k are 27r -periodic and Riemann integrable, is 

g( x) = /( x + t)k ( t ) dt still continuous? [Hint: See Exercise 6.] 

11. Modify the proof of Theorem 15.7 to show that if / is Riemann integrable, 
then (1/7 r) f** f(x + t)k n (t)dt —> f(x) pointwise, as n — ► oo, at each point of 
continuity of /. In particular, cr n (f)(x) -> f(x ) at each point of continuity of /. 


Complex Fourier Series 

Lastly, a word or two about Fourier series involving complex coefficients. Most ad- 
vanced textbooks consider a 27r -periodic function f : R C and define the Fourier 
series for / by 


E 

k — — 00 

where now we have only one formula for the c*: 

C k = ^ J” f(t)e- ik 'dt, (15.9) 

and where, of course, the q are now complex numbers. (The integral of a complex- 
valued function g : R -* C is defined in terms of the real and imaginary parts of g , 
namely, f g = /(Re g) 4- i f (Im g). Thus, in our situation, if we require that both Re / 
and Im / are integrable on [ — tt, jt ], then the integral in equation (15.9) will exist.) 

This somewhat simpler approach has other advantages; for one, the exponentials e ikt 
are now an orthonormal set relative to the normalizing constant 1/27T. Specifically, we 
now define 


</, 




m 8 (t)dt 
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and so have 


{e‘ n ',e' m ' ) 


=— r 

2n J -x 


e int e- imt dt= |0, form ^"’ 


1 , for m = n. 

And, if we remain consistent with this choice and define the L 2 -norm by 

1/2 


ll/lb 




(15.1') 


then we have the simpler estimate ll/lh < ll/lloo for / e C 2 * . 

The Dirichlet and Fejer kernels are essentially the same in this case, except that we 
now write s„( f )(.x) = J2k=-n c k?‘ kx - Given this, the Dirichlet and Fej 6 r kernels can be 
written as 


and 


DM ) = e ikx = 1 + J^(e ikx + e~ ikx ) 


k——n 


k= 1 


= 1+2 ^ cos kx = 


k = 1 


sin (n + 5 ) x 


sin j x 


(15.6') 


D m (x) 

1 r-l sin(wi "f sin^(nx/ 2 ) 

« “J sin | x n s'm 2 (x/2) 



(15.8') 


In other words, each is twice its real coefficient counterpart. Since the choice of a 
normalizing constant ( 1 /rr versus 1 / 2 jt, and sometimes even l/y/n or 1 /V2n ) has a 
(small) effect on these formulas, you may find some variation in other textbooks. 


EXERCISE 


12. Show that we may also write KM) = 5Z*=_ n (1 — (l^l/ M )) e ' kx ■ 

0— 


Notes and Remarks 

The books by Carslaw [1930], Folland [1992], Jackson [1941], Komer [1988], 
Rogosinski [ 1 950], Tolstov [ 1 962], and Zygmund [ 1 935] will supply you with a wealth 
of additional information about Fourier series and their applications. You will find 
discussions of Fourier series along with several related topics in Cheney [1966] and 
Natanson [1964]. For more on the history of Fourier series (and Fourier himself), see 
Birkhoff [1973], Carslaw [1930], Gibson [1893], Gonzalez-Velasco [1992], Grattan- 
Guinness [ 1972], Hawkins [1970], Herivel [1975], Hobson [ 1927], Jeffery [1956], Kline 
[1972], Komer [1988], Langer [1947], Rogosinski [1950], and Van Vleck [1914]. 
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The “barking up the wrong tree” quote is from Halmos [1978]. 

The early history of Fourier analysis was largely concerned with questions con- 
cerning existence, uniqueness, and pointwise convergence of the series. For example, 
Dirichlet [1829] proved that piecewise monotone functions are represented by their 
Fourier series. Jordan [1881] was able to generalize this result to functions of bounded 
variation (which were introduced for just this purpose). The Dirichlet-Jordan theorem 
states, in part, that if / is In -periodic and of bounded variation on \—n, n ], then $„(/)(#) 
converges to [/(0— ) + /(#+)]/ 2, as n -► oo, for each 0 in [— n, n ]. A similar result 
is given by the Dini-Lipschitz theorem, which, in part, states that if / is 2^ -periodic 
and satisfies a Lipschitz condition of order a > 0 on [-jr, n ], then s n (f) converges 
pointwise to /. See, for example, Rogosinski [1950] for further details. Simple proofs 
of pointwise convergence (under various hypotheses) are given in Chemoff [1980], 
Franklin [1924], and Jackson [1926, 1934]. 

Real progress in these delicate matters would wait until the introduction of the 
Lebesgue integral in 1903. As one of the earliest applications of the new integral, F. 
Riesz [1906] introduced the Z^-theory. Once the L 2 -theory was in place, the emphasis 
in Fourier analysis began to shift toward other issues. We will have much more to say 
about these issues later; the Lebesgue integral is the focus of the upcoming chapters. 
For a quick (and unusual) derivation of the Lebesgue integral based on what we already 
know about the Z. 2 -theory, see Van Daele [1990]. 

The notation f{x) ~ («o/2) + (a* cos kx + b k sin Lx), which is used to em- 

phasize the fact that the Fourier series for / is a valid representation for / regardless 
of whether or not it actually converges pointwise to /, is apparently due to Hurwitz 
[1903]. The result in Exercise 2 is one of Fourier’s original examples. Riemann’s lemma 
is from his 1854 Habilitationsshcrifr, see Riemann [1902, pp. 227-265]. Also see the 
excerpt, “Riemann on Fourier series and the Riemann integral,” in Birkhoff [1973]. 

Corollary 15.3 (in a slightly different form) is due to Lebesgue [1906]. The proof of 
Theorem 15.4 is taken from Simon [1969], There are several elementary convergence 
theorems of this type in Jackson [1941] and Rogosinski [1950], Theorem 15.6 is, of 
course, due to the great Hungarian mathematician Lipdt (Leopold) Fej6r; his original 
proof in Fej6r [1900] is amplified in Fej6r [1904], For more on Fej6r himself, see Hersh 
and John-Steiner [ 1 993] (and its references). Fej6r’s theorem fits the mold of Korovkin’s 
theorem (see the notes at the end of Chapter Eleven); for a proof along these lines, see 
Cheney [1966]. 
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Lebesgue Measure 


The Problem of Measure 

If you will recall Fourier’s “proof’ that every bounded function has a Fourier series, 
a central problem is to justify the term-by-term integration of a series of functions. 
Specifically, if we suppose that (/„) is a sequence of integrable, or even continuous 
functions, is it true that 

jf (e /.<«>)< -E (jf /.<*><*«)? 

For that matter, is £ f„ even integrable? And, as long as we’re at it, what does it mean 
to say that a function is integrable? 

These are a few of the questions that Bernhard Riemann set out to answer in “Uber 
die Darstellbarkeit einer Function durch eine trigonometrische Reihe” (On the devel- 
opment of a function by a trigonometric series), a paper submitted in 1 854 as part of 
his Habilitationsschrift, or “inauguration” examination. Riemann’s work on the conver- 
gence of series, along with his concept of an integrable function, were in direct response 
to the problems posed by Fourier’s proof. The paper was deemed incomplete in many 
ways, raising more questions than it answered, and it remained unpublished until 1 867, 
one year after Riemann’s death. Nevertheless, the publication of Riemann’s paper is 
considered a landmark in the history of analysis. According to Grattan-Guinness: 

Soon Weierstrass’s pupils were all working on problems in analysis inspired by 
Riemann; infinitely oscillatory and/or discontinuous functions; continuous non- 
differentiabie functions; modes of uniform and nonuniform convergence; point 
discontinuities of Fourier series; and so on. This was the 1870s, the time of 
Hankel’s contemporaries: The age of Bolzano’s “pure analysis” had arrived with 
a vengeance. 

In our time, the Riemann integral has surely become the workhorse of calculus. 
While this noble beast is a faithful and true servant, it is not without its shortcom- 
ings - not entirely flawed, mind you, just less than perfect. One such blemish, if you 
will, is that the Riemann integral is not defined for as many functions as we might 
hope. To better understand this, let’s take another look at how the Riemann integral is 
computed. 
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Given a nonnegative, bounded function / on [ a, b ] and a partition P of [ a, b ], we 
effectively construct a step function g approximating / and estimate the area under the 
graph of / by the area under the graph of the step function (Figure 16.1). 



As we saw in the last chapter, the Riemann integral of / over [a, b ] exists if these 
approximate areas tend to a specific finite limit as maxt< ; <„ Ax / 0. What’s more, the 

existence of such a limit requires that the oscillation of / be relatively small at “most” 
points in the interval [ a, b ]. In short, the Riemann integral exists only for functions that 
are “almost continuous.” We will make this notion precise later, but for now recall that 
the characteristic function of the rationals in [a, b] fails to be Riemann integrable in 
spite of the fact that it differs from a continuous function, namely 0, at a mere handful 
of points. Evidently, “almost continuous” is a rather restrictive notion. 

Said another way, if the difference of upper and lower sums for f is to tend to 0, 
then / will have to be the “almost uniform” limit of a sequence of step functions on 
[ a, b ]. Again, while a precise statement will have to wait, notice that the characteristic 
function of the rationals in [a, b] is clearly the pointwise limit of a sequence of step 
functions - each having zero integral. 

Either of these heuristic characterizations helps to explain a second shortcoming of 
the Riemann integral: While the Riemann integral easily commutes with uniform limits, 
it is very difficult to work with where pointwise limits are concerned. In this game of in- 
terchanging limits, it would be useful to have a more generous integral. Enter Lebesgue. 

In 1902, Henri Lebesgue published his thesis, “Integrate, longeur, aire,” in which he 
presented an extension of the Riemann integral. The Lebesgue integral is defined for 
what are called “measurable” functions, a class that includes the Riemann integrable 
functions; the new integral reduces to the old in all of the familiar cases. 

Lebesgue’s ideas were influenced by the earlier works of Jordan and Borel, and 
were largely founded on preserving a geometric interpretation of length and area. He 
addressed a variety of issues that, at the time, were not associated with the integral, 
in particular, surface area and curve length. Moreover, Lebesgue’s approach gave new 
insights on the differentiability of monotone functions and an extension of the funda- 
mental theorem of calculus. 

The Lebesgue integral overcomes at least one other shortcoming of the Riemann 
integral: It is very easy to establish the so-called “bounded convergence theorem” for 
the Lebesgue integral. Specifically, if (/„) is a sequence of measurable functions such 
that i fk ls a uniformly bounded sequence converging pointwise on [a, &], then 
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the limit £ fn is necessarily also measurable and satisfies 

rb / oo \ 00 / f b 

/ ( 5Z ) dx = ( / Mx)dx 

While a similar result is known to hold for the Riemann integral, it is much harder to 
prove. 

Lebesgue’s theory of integration provided the ideal tool for research into the trouble- 
some issues surrounding trigonometric series. Lebesgue himself would lead the way. 
By 1910, the Lebesgue integral was firmly established in the research community, and 
by 1930 it had found its way into several popular textbooks. 

The problem of integration, as Lebesgue called it, is to assign to each bounded 
function /, defined on some interval, and each pair of real numbers a and b, a finite 
number J * f (x)dx in such a way that the following six conditions are satisfied: 


1 . 

2 . 

3. 

4. 

5. 

6 . 


/•p ro+h 

I f(x)dx = / f(x — h)dx, for any a, b, and h. 

Ja J a+h 

rb rc ra 

I f(x)dx+ I f(x)dx+ I f(x)dx = 0, for any a y b y and c. 
Ja Jb Jc 

rb rb rb 

I [f(x) + g(x)]dx = I f(x)dx + I g(x)dx, for any / and g. 
Ja Ja Ja 

I f(x)dx > 0 whenever a < b and / > 0. 

Ja 


1 dx = 1. 


f 

rb rb 

If fn(x) increases pointwise to fix), then / f„(x)dx / f(x)dx. 

Ja Ja 


These six conditions are what Lebesgue took to be the minimal set of requirements for 
a “reasonable” integral. And the six conditions are independent; that is, it is possible to 
define the number /* f(x)dx in such a way that it will satisfy any five given conditions 
but fail to satisfy the remaining sixth condition. 

Asking for a “reasonable” integral that is defined for all bounded functions may 
be optimistic, but is worth shooting for. The first five conditions are clearly desirable, 
and the Riemann integral already satisfies these. Thus, we are asking for an extension 
of the Riemann integral that is defined on as large a class of functions as we can 
manage, which preserves the “nice” properties of the Riemann integral and which will, 
in addition, commute with at least certain limits. 

We can paraphrase Lebesgue’s own description of his concept of integrability by 
making a slight revision to our simplistic description of Riemann’s definition. To lift the 
burden of continuity from the integrand /, Lebesgue’s approach is to again approximate 
/ by a simpler function, but this time subdividing the y-axis rather than the x-axis! See 
Figure 16.2. 
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If p = {y 0 , . . . , y n } is a partition of an interval on the y-axis containing the range of 
/, then we might approximate / by the function 


n 

8 — c * ^ 
i=l 


where c, e [y,_i, y,), and where {y,_i < / < y,} is shorthand for the set E, = [x : 
y,_j < f(x) < y, }. Now the premise here is that the integral of g is unambiguously 
defined; by rights it ought to be 





where m(E) denotes the “length” or “measure” of a subset E of [ 0 , b ]. Assuming that 
we can do this, we would then define 



lim ) Cj m(Ei). 
o 


II p \\ 


And what do we gain by this new approach? Well, we are no longer speaking of changes 
in / relative to small changes in x (which suggests continuity); rather, we are speaking 
of changes in / arising from measurable changes in x. 

What we will find is that Lebesgue’s integral is defined for a larger class of functions 
than is Riemann’s, indeed, for any bounded function for which sets of the form {c < 
f < d] are always measurable. The trade-off, of course, is that we will have to decide 
what is meant by the “measure” of a set, and which sets, if any, can be so measured. 

Note that Lebesgue’s approach reduces the problem of integration to that of defining 
the integral for two-valued functions of the form X E . That is, we need to find a suitable 
method of defining the number m(E) = f* X E . In this way, the problem of integration 
becomes the problem of measure. 

The problem is to assign to each subset E of R a nonnegative number m(E ), called 
the measure of £, in such a way that the following properties are satisfied: 


1° m([0, 1 ]) = 1. 

2° m(E 4- h) = m{E) = m(— E), where E + h = {x + h : x e E] and — E = {— x : 
x £ £}; that is, geometrically congruent sets should have the same measure. 
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3° If ( E „ ) is any sequence, finite or infinite, of pairwise disjoint sets, then 

m (U„>l En) = E„>l 

Condition 1° obviously replaces condition 5 in the problem of integration, while con- 
dition 2° replaces condition 1 . Condition 3° will ultimately replace condition 6. The 
three together will imply that the measure of an interval is simply its length, and that the 
measure of a bounded set is at least finite. It is the last condition, condition 3°, that marks 
Lebesgue’s point of departure from what had gone on before. The geometric notions 
of length, area, and volume only call for those measures to be additive across finitely 
many disjoint objects. Based on Borel’s work on the problem, though, Lebesgue knew 
that he must consider countably additive measures, for it is precisely this last condition 
that permits Lebesgue’s integral to commute with certain pointwise limits. 

Unfortunately, the three conditions are not only independent; they are also inconsis- 
tent with the Axiom of Choice. As we will see later, there is no solution to the problem 
of measure if we allow the Axiom of Choice (and we do!). Something has to give. 
For example, we might consider discarding condition 2°, or perhaps weakening condi- 
tion 3° by only requiring finitely additive measures. But neither of these options is sat- 
isfactory. Assuming the Continuum Hypothesis, it can be shown that there is no count- 
ably additive measure defined on all subsets of [0, 1 ] satisfying both m([0, 1 ]) = 1 
and = 0 for every x in 1 0, 1 ]. 

And the outlook is bleak even if we settle for only finitely additive measures, at least 
in R 3 . Consider the Banach-Tarski paradox'. 

Let U and V be nonempty, bounded, open sets in R", where n > 3. Then, there 

exist a k € N and partitions E\ E* and F\ F k of U and V, respectively, 

into an equal number of disjoint subsets such that E, is congruent to Fj for each 
i = l, ... ,k. 

Hence, an orange may be cut into finitely many pieces that could then be reassembled to 
form a citrus behemoth the size of the sun ! Obviously, this result precludes the existence 
of a nonzero, finitely additive measure, defined on all subsets of R 3 , that assigns equal 
measure to congruent sets. 

Well, OK. So we can’t have everything. But rather than sacrifice any of the three 
geometrically aesthetic conditions that we have asked our measure to satisfy, we will 
instead restrict its domain. That is, we will not insist that m be defined on all subsets 
of R. We’ll ultimately settle for a measure defined only on certain “good” sets. What 
we will find is that there are plenty of “good” sets around to do analysis and that, after 
all, is what we came here for. 

The problem of measure is important from a couple of points of view. For one, 
the concept of defining a measure in terms of a list of requirements, rather than by 
simply providing constructive examples and verifying their properties, was brand new 
in Lebesgue’s time. Proclaiming in advance what properties are required of a solution 
lends a new dimension to the problem; by displaying the key issues, the problem 
becomes easier to generalize or to abstract. Although we are quite used to the axiomatic 
approach by now, it was still a novelty at the turn of the century. Equally important is 
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the fact that a problem of calculus, of functions and integrals, has been transformed 
into a problem about abstract sets. 


EXERCISE 

1. Let / be a nonnegative bounded function on [ a, b ] with 0 < / < M. Let 


kM 

* ~ ~2F ~ 


f < 




T 


for each n = 1, 2, . . . f and k = 0, 1 2\ and set cp n = £*Lo(^/2 n )X £„.*• 

Prove that 0 < <p„ < <p„+ 1 < / and that 0 < / — <p n < 2~ n M for each n. 
Thus, (<p n ) converges uniformly to / on [ a y b ]. [Hint: Notice that En.k = £n+l.2 * U 

^n+l.2*+l] 


Lebesgue Outer Measure 

In this section we take a first step toward extending the notion of length. To begin, let’s 
agree that the word interval means a bounded, nonempty interval, that is, any one of 
the sets [a, b ], (a, b ), [a, b), or (a, b ], where a and b are finite real numbers with a < b. 
If I is any one of these four sets, we will use the shorthand €(/) = b - a to denote 
the length of I. We will call sets of the form (-00, b ], (a, 00), and so on, unbounded 
intervals and put 1(1) = 00 in any of these cases. In short, the word interval, with no 
additional quantifier, always means a bounded interval. 

Now the notion of length obviously extends to finite unions of pairwise disjoint 
intervals. But, in fact, it extends unambiguously to all countable unions of pairwise 
disjoint intervals. Indeed, we simply take the sum of the lengths of the constituent 
intervals as the “total length” of the union. In general, though, given countably many 
intervals (/„), not necessarily disjoint, the sum £(/„) will be an overestimate for 
the total length of their union /„. The following lemma (which is obvious for 
finite collections of intervals) justifies this claim. 

Proposition 16.1. Let (/„) and (7*) be sequences of intervals such that /„ = 

U*ti Jk- If the I n are pairwise disjoint, then JjjfL , l(I n ) < £(•/*)• Thus, if 

the Jk are also pairwise disjoint, then the two sums are equal. 

proof. Suppose, to the contrary, that , t ( I „) > f(A)- Then, for some 

N, we must have l(l„) > Y.T=\ Of course, we also have (J^L, In C 
IX, Jk- But now, by expanding each Jk slightly and shrinking each /„ slightly, 
we may suppose that the /* are open and the I n are closed. (How?) Thus, the 7* 
form an open cover for the compact set (J* =1 /„. And here is the contradiction: 
Since we have 1 f(f«) > f(7*), for any M, the sets (7*) form an open 

cover for (J*=i In that admits no finite subcover. □ 
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Now we are ready to extend t to arbitrary subsets of R. Given a subset £ of R, we 
define the (Lebesgue) outer measure of E by 

1 00 00 

£f(/„):Ec(J/ rt 

n=1 n=l 

where the infimum is taken over all coverings of E by countable unions of intervals. 
Thus, the outer measure of E is the infimum of certain overestimates for the “length” 
of E. Before we say more, let’s check a few simple properties of m*. 

Proposition 16.2. 

(i) 0 < m*(£) < oo for any E. 

(ii) IfE c F, then m'(E) < m*{F). 

(iii) m*(E + x) = m*(£), where E + x = {e + x : e 6 £}. 

(iv) m*(£) = 0 for any countable set E. 

(v) m*(E) < oo for any bounded set E. 

(vi) m*(£) = mfiZZ^n-a *) : E C IT=.(«-M- 

proof. The first three properties are nearly immediate from the definition of m* 
and are left as exercises. For (iv), suppose that £ = [e\,ei , . . .}. Given e > 0, 
notice that £ c ,(e„ — 2~ n e, e„ + 2 -n £), and hence that m*(£) < 2e. Next, 
for (v), note that if £ is bounded, then £ c [ a, b ] for some (finite) a < b. Thus, 
m*(E) <b — a < oo. Finally, given £ c R, notice that we always have 


m*(£) < inf 


wv u 

Y^(b„ — a„) : E c. |J(a n , b„) 


<i=i 


To establish the reverse inequality, then, it is enough to consider the case m*(£) < 
oo. (Why?) Now, given e > 0, choose a sequence of intervals (/„) covering £ such 
that ]T^L| l(l„) < m*(£) + e. For each n, let J n be an open interval containing 
/„ with €(7„) < t(I„) + 2~ n e. Then, (7„) covers £ and t{J„) < m*(E) + 2e. 
This proves (vi). □ 

Examples 16.3 

(a) Please note that there are unbounded sets with finite outer measure. A rather 
spectacular example is m*(Q) = 0. There are also uncountable sets with outer 
measure zero; recall from Chapter Two that the Cantor set A has outer measure 
zero. Indeed, for each n, the Cantor set is contained in a finite union of intervals 
of total length 2"/3". Thus, m*(A) < 273" -> 0. 

(b) Sets of outer measure zero, or null sets, play an important role in analysis; 
they provide another notion of “small” or “negligible” sets. Based on the two 
examples we have at hand, this makes for a curious comparison. From the point 
of view of cardinality, A is big (uncountable) while Q is small (countable); from 
a topological point of view, A is small (nowhere dense); while Q is big (dense); 
and from the point of view of measure, both A and Q are small (measure zero)! 
You will find further curiosities of this sort in the exercises. 
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(c) Quite often we encounter properties that hold everywhere except on a set of 
measure zero. We say that such a property holds almost everywhere, abbreviated 
“a.e.” (Some authors use “almost all” or “almost always,” abbreviated “a.a.,” 
while probabilists use “almost surely,” abbreviated “a.s .” In some older books the 
abbreviation “p.p.” is used, for the original French “presque partout”) By way of 
an example, notice that the Cantor function / : [ 0, 1 ] -*■ [ 0, 1 ] satisfies /' = 0 
almost everywhere, since / is constant on each subinterval of the complement 
of A. 

(d) From Proposition 16.2 (iv), any countable set of exceptions would come under 
the almost everywhere banner. For instance, we might say that Xq = 0 almost 
everywhere, or that a monotone function / is continuous almost everywhere, 
that is, m*(D(f)) = 0. 

(e) The point to statement (vi) of Proposition 16.2 is that the definition of m* has 
little to do with the particular type of intervals used; we might just as well have 
taken closed intervals. The advantage to using open intervals is that we now 
have a connection between the geometry of R (length) and the topology of R 
(open sets). We will have more to say about this observation later. 

(f ) Lebesgue originally defined outer measure for subsets £ of a bounded interval 
[a,b]. In this case, he also defined the inner measure of £ as rn,(E) = b — 
a — m*{[a,b] \ £). It is not hard to see that m,(£) < m*(E ); that is, inner 
measure is an underestimate of the “true” length of £ while outer measure is an 
overestimate (see Exercise 7). 


Next, let’s check that outer measure truly is an extension of length. 

Proposition 16.4. m*(/) = €(/) for any interval l, bounded or not. 

proof. The heart of the matter is checking that the proposition holds for compact 
intervals, that is, m t ([a,b]) = b - a. Assuming that we have done this, let’s see 
how this special case settles all other cases. 

First, if I is unbounded, then I contains compact intervals of length n for any 
n > 1. By monotonicity (Proposition 16.2 (ii)), m*(l) > n for any n; hence, 
m*(/) = oo = til). 

Next, if / is a bounded, noncompact interval with endpoints a < b, then 
[a + e/2, b - e/2\ C / C [a,b J for any e > 0. Again using monotonicity, it 
follows that b — a — e < m*(I) < b — a for any e > 0; that is, m*(l ) = b — a = 
f(/). 

So let’s get to work! Let 1 = [a,b]. Since / is itself one of the candidate 
intervals used in computing m*(/), we certainly have m*(/) < b — a\ we need 
to check that m*(/) > b — a. Now, given £ > 0, Proposition 16.2 (vi) supplies 
a sequence of open intervals ( a n , b„) such that I c Ub?=\( a nb„) and m*(I) > 
Yl%L\(b n —a„) - e. Since I is compact, we know that there are finitely many open 
intervals here that will cover /, say / c U"=i( a i - )• By discarding any extraneous 

intervals and relabeling, if necessary, we may suppose that a i < a 2 < ■ ■ ■ < a„ and 
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that (a/, bi) n / ^ 0 for each i = 1 , . . . , n. But / is connected! Thus, consecutive 
intervals from (a\, b \), . . . , ( a n , fr M ) must actually overlap; that is, U?=i( fl /» ) rnust 

be an interval containing /. (Why?) Hence, — a,-) > £" =1 (fe/ ~ a i) - 

1(1) — b — a and so m*(7) >b — a — e. □ 


EXERCISES 

2 . Prove statements (i) and (ii) of Proposition 16.2. 

3 . Earlier attempts at defining the measure of a (bounded) set were similar to 
Lebesgue’s, except that the infimum was typically taken over finite unions of in- 
tervals covering the set. Show that if Q H [ 0, 1 ] is contained in a finite union of 
open intervals (J”_,(a r -, &,•), then Y2i=\(bi — #/) > 1. Thus, Q fl [0, 1 ] would have 
“measure” 1 by this definition. 

o 4 . Given any subset E of M and any h G R, show that m*(£ + /i) = m*(£), where 
Zs + ft = {jc + h : jc e £}. 

5 . If we define rE = [rx : jc g E), what is m*(rE) in terms of m*(E)l 

6. If E has nonempty interior, show that m*(E) > 0. 

7. Referring to Example 16.3 (f), show that m *(£) < m*(E) for any E C [a,b]. 

o 8. Given 8 > 0, show that m*(E) = infJ^lj l(I n ) where the infimum is taken 
over all coverings of E by sequences of intervals (/„), where each I n has diameter 
less than 8 . 

>9. If E — (X, h is a countable union of pairwise disjoint intervals, prove that 
m*(E) = ET-i t-Un)- 

10 . Prove that m* Un) = Y^L\ f° r an Y sequence ( U n ) of pairwise 

disjoint open sets. 

11 . Prove that m*(E) = inf Wn) where the infimum is taken over all cover- 
ings of E by sequences of pairwise disjoint open intervals (/„). 

12 . Prove that m*(E) == inf {m*(U) : U is open and E C U }. 

13 . Show that m*(E U F) < m*(E) + m*(F) for any sets E , F. 

14 . If E and F are countable unions of pairwise disjoint intervals, prove that 
m*(E U F) + m*(E D F) = m*(E) + m*(F). [Hint: First verify the formula 
when E and F are finite unions of pairwise disjoint intervals. How does this help?] 

15 . Prove that a subset of a set of outer measure zero is again a set of outer measure 
zero. Prove that a finite union of sets of outer measure zero has outer measure zero. 

> 16 . If m*(E) = 0, show that m*(E U A) = m*(A) = m*(A \ E) for any A . 

17 . If E C [a, b ] and m*(E) — 0, show that E c is dense in [a, b]. 

18 . If E is a compact set with m*(E) = 0, and if e > 0, prove that E can be 

covered by finitely many open intervals, satisfying ^2% \ < e - 
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19. For E c [a, b ], show that m*(E) = 0 if and only if E can be covered by a 
sequence of intervals (/„) such that m *(7n) < °°> anc * such that each x € E is 
in infinitely many l n . 

20. If m*(E) = 0, prove that m*(E 2 ) — 0, where E 2 = {x 2 : x € E}. [Hint: First 
consider the case where E is bounded.] 

21. If / : R — > E satisfies \f(x ) — /(y)| < K\ x — y | for all x and y, show that 
m*(f(E)) < Km*(E) for any E C R. 


We have come a long way toward solving the problem of measure. We now have an 
extension of the notion of length that is defined for any subset of R and that, according 
to Proposition 16.2 (iii), is translation-invariant . All that is missing is the countable 
additivity and here, as we’ll see, is where outer measure falls short. We can come close, 
though: m* is at least countably subadditive. 

Proposition 16.5. m * (LCi E „) < EZi rn*{E n )for any sequence (E n ) of sub- 
sets ofR. 

proof. We may clearly suppose that m*(E n ) < oo for each n , for otherwise there 
is nothing to show. Now, let e > 0. For each n , choose intervals (I n j) with 

00 oo 

En c U I„,i and ^ w * (/ ".' ) ^ + Tn- 

Then Ur=i En C U°°=i LC, ^ so 

( oo \ oo oo oo 

U £ ») - - ^2 m *(En)+s, 

n=\ f «=1 i=l n=l 

which proves the Proposition. □ 

Corollary 16.6. Ifm*(E n ) = 0 for each n, then m* E n ) = 0. 

Corollary 16.7. Given a subset E of R and e > 0, there is an open set G con- 
taining E such that m*(G) < m*(E) + e . Consequently ; 

m*(E) = infpn*(I/) : C/ w open and E c I/}. 

proof. According to Proposition 16.2 (vi), we may choose a sequence of open 
intervals (/„) covering E such that m*(I n ) < m*(£’) -F e. But then, G = 
U W °°=1 is an open set containing E and m*(G) < m *(In) < m*(£) + e. 
Since m*(£) < m*(G) whenever £cG, the second assertion now follows. □ 

Although we cannot hope to show that m* is countably additive, in general, we can 
at least spell out one easy case where m* is finitely additive. 

Proposition 16.8. If E and F are disjoint compact sets , then m*(E U F) = 
m*(£) -4- m*(F). 
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proof. If E and F are disjoint compact sets, then 

d(E, F) — inf{|* — y\ : x e F, y € F} > 0. 

Thus, no interval of diameter less than <5 = d(E , F) will hit both E and F . 

Now, given e > 0, we can choose a sequence of open intervals (/„) covering 
E U F such that each l n has diameter less than <$, and such that m*(/„) < 
m*(E U F) + e. (How?) Note that a given l n can hit at most one of E or F. Thus, 
if (V n ) and (/") denote those /„ that hit £ and those that hit F, respectively, then 
E c U“, ^ and F C U n °°=, C Hence, 

00 oo 

m*(E) + m*(F ) < £ m*(/') + £ m*(0 

n=l n = l 

oo 

< ^ m*(/„) < /n*(F U F) + e. 

fJ = l 

That is, /n*(F)4-m*(F) < m*(FUF). Sincem*(FUF) < m*(F)-|-m*(F) follows 
from Proposition 16.5, we are done. □ 

Corollary 16.9. If E\,... 9 E n are pairwise disjoint compact sets , 


EXERCISES 

c> 22. Let E = Show that ra*(F) = 0 if and only if m*(E n ) = 0 for 

every n. 

23. Given a bounded open set G and e > 0, show that there is a compact set F C G 
such that m*(F) > m*(G) — s. 

> 24. Given a subset E of E, prove that there is a G^-set G containing F such that 
m*(G) = m*(F). 

25. Suppose that m*(E) > 0. Given 0 < a < 1, show that there exists an open 
interval I such that m*(E H /) > am* (I). [Hint: It is enough to consider the case 
m *(E) < oo. Now suppose that the conclusion fails.] 

26. Given E C E, show that the set of points x for which m*(E H /) > 0, for all 
open intervals I containing x, is a closed set. 

27. For each n , let G n be an open subset of [ 0, 1 ] containing the rationals in [ 0, 1 ] 
with m*{G n ) < \/n y and let H = fj^Li G„. Prove that m*(H) = 0 and that 
[ 0, 1 ] \ H is a first category set in [ 0, 1 ]. Thus, [ 0, 1 ] is the disjoint union of two 
“small” sets! 

> 28. Fix a with 0 < a < 1 and repeat our “middle thirds” construction for the Cantor 
set except that now, at the nth stage, each of the 2” _I open intervals we discard from 
[0, 1 ] is to have length ( 1 — a) 3~ n . (We still want to remove each open interval from 
the “middle” of a closed interval in the current level - it is important that the closed 
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intervals that remain turn out to be nested.) The limit of this process, a set that we 
will name A a , is called a generalized Cantor set and is very much like the ordinary 
Cantor set. Note that A„ is uncountable, compact, nowhere dense, and so on, but has 
nonzero outer measure. Indeed, check that m *(A„) = or. (See Chapter Two for an 
example.) [Hint: You only need upper estimates for m *( A 0 ) and m *( A£).] 

29. In the notation of Exercise 28, check that A|_ ( i/ n) has outer measure 1. 
Use this to give another proof that [ 0, 1 ] can be written as the disjoint union of a set 
of first category and a set of measure zero. 

30. Here is a related construction: Let (/„) be an enumeration of all of the closed 
subintervals of [ 0, 1 ] having rational endpoints (this is a countable collection). In 
each /„, build a generalized Cantor set K„ having measure m *(K„) = m *(/„)/ 2". 
Now let K = (J“ , K„. Prove that both K and its complement are dense in [ 0, 1 ] 
and that both have positive outer measure. 


Riemann Integrability 


Rather than generate more properties of m\ let’s take a break for an important ap- 
plication: We next present Lebesgue’s criterion for Riemann integrability (which is a 
restatement of Riemann’s own criterion). 


Theorem 16.10. Let f :[a,b] -*■ R be bounded. Then, f is Riemann integrable 
on [ a, b] if and only if m*(D(f)) = 0, that is, if and only if f is continuous at 
almost every point in [a, b]. 


Before we dive into the proof, please note that the condition “continuous at almost 
every point” or, briefly, “continuous a.e.,” means something very different from the 
condition “almost everywhere equal to a continuous function.” Indeed, the characteristic 
function of the rationals is almost everywhere equal to 0 (a continuous function) but is 
not continuous at any point. Moreover, note that the characteristic function of [ 0, 1 /2 ] 
is continuous a.e. in [ 0, 1 ] but is clearly not equal a.e. to any continuous function. 
(Why?) Thus, the two conditions are incomparable in spite of their apparent similarity. 

Next, let’s recall our notation. First, 


£>(/) = {* € [a, b ] : co f {x) > 0} 


-yl 


x e[a y b] : co f (x) > - 
n 


where 


Q) f (x) = inf co{f \I) = inf sup \f(s)-f(t)\ % 
ibx ibx s teI 

and where I denotes an open interval containing x. Recall, too, that the set [x : co/{ x) > 
(1 /«)} is closed for each n. We will refer to this set using the abbreviated notation 
U Of > (1 /n)}. Now, since D(/) is written as a countable union, we may rephrase the 
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conclusion of Lebesgue’s theorem: 

feTZ[a,b ] <=> m*(D(/)) =0 

<==> m* ({w/ > j }) = 0 for all n. 

(Why?) Finally, recall that the difference between an upper and a lower sum can be 
written in terms of the oscillation of /: 

n 

U(f , P) - Uf ; P) = £>(/; x, 1) Ax,, 

i=l 

where, in our new terminology, Ax, = m*([ x,_i , jc, ]). The fact that (o/(x) is defined in 
terms of open intervals while (/(/, P) - L(f , P) is written in terms of closed intervals 
is a minor nuisance, but nothing we can’t handle. 

Since this is essentially all that is needed to prove the forward direction of Lebesgue’s 
theorem, let’s get that out of the way. 

proof (ofTheorem 16.10, forward implication). Let / 6 <?,£>], and fix* > 1. 

We will show that m*({a>/ > (1/it)}) = 0 and, hence, that m*(D(/)) = 0. 

Given e > 0, choose a partition P = (x 0 , x„) such that 

U(f, P) - L(f, P) = ^2 a)(f-,[xi-\,Xj ]) A Xj < 

Notice that if x e {a>f > (l/it)}n(jr,_i, jc,),thenft)(/;[jc,_i, jc,- ]) > (o/(x) > (1/it). 

Now, since the open intervals (jc,-_i , x,), i = 1 n, cover all but finitely many 

points of [ a,b ], it follows that those that hit (&>/ > (1/*)} will cover all but 
finitely many points of {co/ > ( 1 //:)}. But finite sets have outer measure 0; hence 

l > ^w(/; x, ]) Ax, > i J2' AXi - ({"/ - ^})’ 

where denotes the sum over those / for which {a>f > (1 /k)) n Xj) ^ 0. 
Thus, > (1 //:)}) < e. □ 

The backward direction of Lebesgue’s theorem is somewhat harder. We begin, 
though, with an easy observation. 

Lemma 16.11. If a)f(x) < 8 for all x in some compact interval J, then there is a 
partition Q = {fo, . . . , t n ) of J such that co(f ; [ /,_j , /, ])< 8 for all i = 1 
Hence , U(f Q) - L(f Q) < 8m*(J). 

proof. For each jc € 7, choose an open interval I x containing x such that 
a)(f ; I x ) < 8 and a second open interval J x with jc € J x c J x C I x . Note 
that a)(f ; J x ) < 8, too. The intervals J x form an open cover for the compact set 
7, and so finitely many will do the job, say, 7 c U?=i where <o(f ; 7,-) < 8 for 
each / = 1 , . . . , k. 

Now let Q = {f 0 t n } be any partition of 7 containing the endpoints of each 

of the intervals 7 0 7/. Then, since each interval r, ] is contained in some 
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7 m , we have co(f ; [ i , /, ]) < 8. Hence, 

n 

U(f, Q) - L(f, Q) = £a,(/; [ ]) A/, 

1 = 1 

/I 

< 5 ^ A/, = 5 □ 

r=i 

Finally, we are ready to finish the proof of Theorem 16.10. 

proof (ofTheorem 16.10, reverse implication). Suppose thatm*(D(/)) = 0;that 
is, suppose that > (1/it)}) = 0 for all k. We must show that / eTZ{a,b]. 

Given e > 0, we first choose a positive integer k with (1/A:) < e. Next, since 

(wf > (1/A:)} is compact, we can find finitely many open intervals I\ /„ such 

that {cof > (1/A:)} c (J"=i h and £" = , m*(/y) < e. (How?) 

Now [ a, b ] \ U"=i h is a finite union of closed intervals, say J, J r , and 

iof{x) < (1/A:) < e at each point x e Ui=i -A- I* 1 this way, [a, b] has been de- 
composed into two sets of intervals: the /,, which have small total length, and 
the Jj, on which / has small oscillation. We may apply Lemma 16.1 1 to find 

partitions Q\ Q r of J\ J r such that t/(/, £?.) - L(/, Qi) < em*(Jj ) for 

each i = 1 r. 

If we define a partition of ( a, b } by setting P = [a, b] U ((J/=i Qi)> then 
(/(/, P ) - L{f , P) = £ [(/(/, Qi) - La Qi )] + £ *>(/; ij)mVj) 

/=1 j =\ 

r n 

<e J2m*(7 i ) + 2||/|| oc ^m*(/y) 
j= i ;=' 

< e(fc-a) + 2e ll/IU. 


Hence, / 6 Tl[a, b). □ 

Combining Lebesgue’s criterion with Theorem 14.19 yields two useful corollaries 
(see also Exercise 14.50). 

Corollary 16.12. If f eK[a,b] and F(x) = f* f , then F' = / a.e. (In particu- 
lar , ; F' exists a.e.) 

Corollary 16.13. Iff el Z[a,b] and f* \ f\ = 0, then f = 0 a.e. 


EXERCISES 

> 31. For which subsets A C [ a, ft ] is X Riemann integrable? 
32. Prove Corollary 16.12. 
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33. Give a direct proof of Corollary 16.13. [Hint: If / is continuous at Jto, and if 
f(x o) 7 * 0, show that f* \f\ > 0.] 

34. If / e 7£[ a, & ] and f* f = 0 for all jc, prove that / = 0 a.e. 

> 35. If / € 7£[a, b] and / = g a.e., does it follow that g € TZ[a,b J? What if 
“a.e.” is weakened to “except at countably many points”? Or to “except at finitely 
many points”? 

36. If /, g e Tl[ a, b ] and / = g a.e., does it follow that fa f = fa g ? 

37. Let G be an open set containing the rationals in [0, 1 ] with m’(G) < 1/2. 
Prove that / = X c is not Riemann integrable on [0, 1 ]. Moreover, prove that / 
cannot be equal a.e. to any Riemann integrable function on [ 0, 1 ]; in other words, / 
is “substantially different” from any Riemann integrable function. 


Measurable Sets 

Let’s briefly summarize our progress thus far. We have successfully defined a nonnega- 
tive function m*, defined on all subsets of R, that satisfies: 

• m* extends the notion of length; if / is an interval, then m*(/) is the length of /. 

• m * is translation invariant; m*(£ + *) = m*(E) for all E and all x € R. 

• m* is countably subadditive; m * (U£ii E n ) < , m‘(E n ) for any sequence of 

sets (£,,). 

• m* is countably additive in certain cases; if (G„) is a sequence of pairwise disjoint 

open sets, then m* (|J“ , G„) = (Why?) 

• m* is completely determined by its values on open sets; indeed, m*(E) = 
inf{m*(f/) : U is open and E c U). 

The rumored failure of m* to be countably additive, in general, will have to be taken on 
faith for just a bit longer - we will see an example later in this chapter. For now, let’s 
concentrate on the good news: By taking a closer look at our last two observations, it 
is possible to isolate a large class of sets on which m* is countably additive. The secret 
is to consider sets that are, in a sense, “approximately open.” 

Specifically, we say that a set E is (Lebesgue) measurable if, for each e > 0, we 
can find a closed set F and an open set G with F c E c G such that m*(G \ F) < e. 

Please note that if E is measurable, then so is E c , since G c C E c C F c and F C \G C = 
G \ F. In fact, we might paraphrase the measurability condition by saying that both E 
and E c are required to be “approximately open.” In any case, notice that E is measurable 
if and only if E c is measurable. 

It is very easy to see that any interval, bounded or otherwise, is measurable. Equally 
simple is that any null set is measurable. Indeed, if m*(£) = 0, then, for any e > 0, 
we may choose an open set G containing £ such that m*(G) < e. Since £ = 0 is a 
perfectly legitimate closed subset of £, it follows that £ is measurable. 
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It is less clear that every open (closed) set is measurable. To help us with this task, 
let’s first legitimize the usual operations with measurable sets. 

Lemma 16 . 14 , If E\ and £2 are measurable sets, then so are E\ U Ei, E\ n £2, 
and E\ \ £2. 


proof. Since E\ fl £2 = (£[ U E^f and E\ \ £2 = £| n ££, it is enough to check 
that £| U £2 is measurable whenever E\ and £2 are measurable. (Why?) 

Let e > 0. Choose closed sets F\ , £ 2 and open sets G 1 , G2, with £| c £1 C G\ 
and £2 C £2 C G2, and such that m*(G| \ £1) < e /2 and m*(G2 \ £2) < e/ 2 . 
Then £ = £1 U £2 is closed, G = G\ U Gi is open, £ c £1 U £2 C G, and 
G \ £ C (G, \ £,) U (G 2 \ £2). Thus, 

m*(G \ £) < m*(G\ \ £1) + m*(G 2 \ £2) < e. □ 

We will write M for the collection of all measurable subsets of R. Our goal in this 
section is to show that M contains a wealth of sets. From what we have just shown, we 
know that M is an algebra of sets (sometimes called a Boolean algebra or Boolean 
lattice). Specifically, this means that E c € M whenever £ € M and £ U £ e M 
whenever E, F e M. By induction (and De Morgan’s laws), it is easy to see that M is 
actually closed under any finite string of set operations. 

The hard work comes in showing that M is closed under countable unions and 
intersections, too. From this it will follow that M contains the open sets, the closed 
sets, the G$- sets, the F„- sets, and so on. That may sound like a lot of sets, but all of 
these constitute a mere drop in the bucket! (All of the sets that we have listed so far, 
for example, form a collection having cardinality only c, whereas there are 2 C subsets 
of R altogether.) 

In fact, the simple observation that A € M already implies that M is a huge collection 
of sets. How? Well, since A is a null set, so is every subset of A. Consequently, A and all 
of its subsets are measurable; thus, V( A) c M C 'P(R). But A has the same cardinality 
as R, and hence M has the same cardinality as £(R). Given this, it may surprise you to 
learn that there are, in fact, nonmeasurable subsets of R. On the other hand, it will now 
come as no surprise that finding an example of a nonmeasurable set is by no means easy. 
This strange example awaits us later in this chapter, where we will solve the mystery 
of the lost countable additivity of m*. 

But for now, back to work! We still need to establish that open sets are measurable. 
We will begin by showing that bounded open sets and bounded closed sets (i.e., compact 
sets) are measurable. 

Lemma 16 . 15 . 

(i) If G is a bounded open set, then, for every e > 0 , there exists a closed set 
£ C G such that m*(F) > m*(G) — e. 

(ii) If F is a bounded closed set, then, for every e > 0 , there exists a bounded open 
setGDF such that m*(G) < m*(F) + e. 

(iii) If F is a closed subset of a bounded open set G , thenm*(G\F) = m*(G)-m*(F). 
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proof. Let G be a bounded open set and write G = /„, where (/„) is a se- 
quence ofpairwise disjoint open intervals. Then (from Exercise 9), m*(/„) = 

m*(G) < oo. Now, given e > 0, choose N such that m *Vn) < e/2. 

For each n = 1, ..., AT, choose a closed subinterval J„ c 1„ with m*(J„) > 
m*(l„) - e/(2 N). Then, F = (J^ = , J„ is a closed subset of G and, from Corollary 
16.9, 

N N 

m*(F) = ^mV„) > - e/2 > m*(G) - e. 

n=\ n=\ 

This proves (i). 

Next, suppose that F is a bounded closed set, and let e > 0. Since F is a 
compact set of finite outer measure, we may choose finitely many open intervals 

!\ /„ such that G = jj>=i h is an open set containing F, and such that 

m*(G) < £"_ i m *Uj) 5 m*(F) + e. This proves (ii). 

Finally, suppose that F is a closed subset of a bounded open set G. Then G\F 
is also a bounded open set. Hence, by (i), for any e > 0, there is a closed set 
E c G \ F such that m'{E) > m*{G \ F) - e. But then, E and F are disjoint 
compact sets and so 


m*(G) < m*(G \ F) + m*(F) 

< m*(E) + e + m*(F) 

= m*(E Uf) + E< m*(G) + e. 

Since this holds for any e, we must have m*(G) = m*{G \ F) + m*(F). This 
completes the proof. □ 

Our next lemma shows that it is enough to consider bounded sets when testing 
measurability. 

Lemma 16.16. E is measurable if and only if EC\{a,b) is measurable for every 
bounded open interval (a, b). 

proof. The forward implication is clear from Lemma 16.14. So, suppose that 
E n (a, b) is measurable for any (a, b), and let e > 0. Then, in particular, for 
each integer n e Z we can find a closed set F„ and an open set G„ with F„ c 
£(!(«, h + 1) c G„ and such thatm*(G,,\/ r n ) < 2 _|n| e. By enlarging G„ slightly, if 
necessary, we may also suppose that both n,n + 1 € G„ . In this way, G = U n€ z C« 
is an open set containing E. 

Now, F = Un«z F n is certainly a subset of E, but is it closed? Well, sure! 

A conveigent sequence from F must eventually lie in some open interval of the 
form (n — 1, n + 1). Thus, all but finitely many terms are in F„_i U F„ for some 
n. Since F„_| U F„ is closed, the limit must be in one of the two; in particular, the 
limit must be back in F. Thus, F is closed. 

Finally, G \ F C LLz(C« \ £„), and hence m*(G \ F) < £„* z m*(G n \ F„) < 
£ n6Z 2- | " | £ = 3F. □ 
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Corollary 16.17. Open sets, and hence also closed sets, are measurable. 

Finally we are ready to show that M is closed under countable disjoint unions. At 
the same time, we will show that m* is countably additive when applied to pairwise 
disjoint measurable sets. 

Theorem 16.18. If(E„) is a sequence of pairwise disjoint measurable sets, then 
E = (J“ | E„ is measurable and m*(E) = m*(E„). 

proof. We first suppose that E is contained in some bounded open interval I 
and, in particular, that m'(£) < oo. Of course, this means that E„ c / for all 
n, too. Now, given e > 0, choose closed sets (F„) and open sets (G„) such that 
F n c E„ c G„ c I and such that m*(G„ \ F„) < 2~ n e for all n. Next, since the 
E„ are pairwise disjoint and bounded, so are the F„. Hence, for any K, we have 

K K 

Yl m '(En) < X>*(F n ) + £ (Why?) 

n= 1 n— 1 



< m*{E) 4- e. 


Since K and e are arbitrary, it follows that m*(£ n ) < m*(£). Thus, m*(E) = 
wi*(£ n ), since the other inequality is supplied by countable subadditivity. 
Next, notice that we also have 

OO OO 

7: m*(G„) < ^ m*(£„) + £ = m*(£) + e < oo. 

n=l n— I 

In particular, we may choose N such that m'(G n ) < e. Finally, G = 

U~, G„ is an open set containing £ and £ = |J^ = , F„ is a closed set contained 
in £ with 


m*(G\£)<^m*(C n \£ n )+ £ m'(G n ) < It. 


n= I 


=yv+i 


Hence, £ is measurable. 

Lastly, suppose that £ is unbounded. We know that £ is measurable from the 
first part of the proof (and Lemma 16.16), but we still have to check countable 
additivity in this case. To this end, consider 

E n ,k = E n n (k, k + 1 ] and A k = ECHk.k + I ], forfceZ. 

The sets (£„.*) and (A*) are measurable and pairwise disjoint and, of course, 

OC OO oc OO 

E n = U E n . k and E = (J A* = JJ (J £„.*. 

k =- oo k——oo it=-oo/i=l 
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By countable subadditivity and the first part of the proof we have 

OO OO 00 CO oo oo 

£>*(£„)<£ £ m *(E„,k) = J2 !>*<£..*)= £ rn\A k ), 

n—\ n — 1 k——oo k—~o o «=1 k=—o o 

since each A* is a bounded measurable set. But, for any A, the first part of the 
proof also tells us that 

N / N 

m*(A k ) = m* I [J A k 

k=-N \k=-N 

Putting the pieces together, we get m*(E n ) < m*(E), which is all we 
need. □ 

Theorem 16.18 tells us that M is closed under countable disjoint unions, but what 
about arbitrary countable unions? Well, as luck would have it, since M is an algebra 
of sets, disjoint unions are the rule and not the exception. 

Lemma 16.19. Given any sequence (A,-) of measurable sets , we can find a 
sequence ( /?, ) of disjoint measurable sets such that B ( c A, for all i and 

ur.iAi=ur.ift- 

proof. Let B\ = Aj, and for each n > 1 let B n — A n \ (J A/. Then B n e M, 
since M is an algebra. Clearly, B { c A t for all /, B t n Bj = 0 for i ^ y, and 

U7.i = IXi B i for a11 n • D 

Corollary 16.20. If (E n ) is an arbitrary sequence of measurable sets, then 
E n and E n are measurable. 

An algebra of sets that is closed under countable unions (or intersections) is called 
a a -algebra. Thus, we have shown that the collection M of measurable sets is a <r- 
algebra and that the restriction of m* to M is countably additive (and so is a solution, 
of sorts, to the problem of measure). 

Lebesgue measure m is defined to be the restriction of m* to M. If E is measurable, 
we write m{E) in place of m*(E ), and we refer to m(E) as the (Lebesgue) measure of E. 



^ < m*(E). 


EXERCISES 

38. Prove that E is measurable if and only if E H K is measurable for every compact 
set K. 

39. If A D B are measurable, show that m(A \ B) = m(A) — m(B) whenever 
m(B) < oo. 

40. If A and B are measurable sets, show that m (A U B) + m(A fl B) = m (A) 4- 
m(B). 

41. Let E denote the set of all real numbers in [ 0, 1 ] whose decimal expansions 
contain no 5’s or 7’s. Prove that E is measurable and compute m(E). [Hint: There 
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are only a few “ambiguous” numbers; it does not matter whether they are included. 
Why?] 

> 42. Suppose that £ is measurable with m ( E ) = 1 . Show that: 

(a) There is a measurable set F C E such that m(F) = 1/2. [Hint: Consider the 
function f(x) = m(E Pi (— oo, x ]).] 

(b) There is a closed set F , consisting entirely of irrationals, such that F C E and 
m(F) = 1/2. 

(c) There is a compact set F with empty interior such that F C E and m(F) = 1/2. 

43. Let E C [ a, b ]. According to Lebesgue’s original definition, E is measurable 
if and only if m*(£) = m*(E). (See Example 16.3 (f).) Check that Lebesgue’s 
definition is the same as ours in this case. [Hint: It is easy to see that our notion of 
measurability implies Lebesgue’s. If, on the other hand, E is measurable according 
to Lebesgue’s definition, note that an open superset of [ a, b ] \ E supplies a closed 
subset of £.] 

44. Let £ be a measurable set with m(E) > 0. Prove that £ — £ = [jc — y : 
jc, y € £} contains an interval centered at 0. [This is a famous result due to Steinhaus. 
There are several proofs available; here is a particularly simple one: Take / as in 
Exercise 25 for a = 3/4. If \x\ < m(I)/ 2, note that / U (/ 4- x) has measure at 
most 3m(/)/2. Thus, £ n / and (£ fl /) + x cannot be disjoint. (Why?) Finally, 
(£+jc) 0 £ ^ 0 means that x e £ — £; that is, £ — £ D (— m(/)/2, m(/)/2).] 

45. Let / : X Y be any function. 

(a) If £ is a or- algebra of subsets of K, show that A = {/"'(£) : B e B] is a 
a -algebra of subsets of X. 

(b) If A is a <7 -algebra of subsets of X, show that B = [B : € A] is a 

a -algebra of subsets of Y. 

46. Let A be an algebra of sets. Show that the following are equiv- 
alent: 

(i) A is closed under arbitrary countable unions; that is, if £„ € A for all n, then 

ur=> E n € a 

(ii) A is closed under countable disjoint unions; that is, if (£„) is a sequence of 
pairwise disjoint sets from A* then £„ e A. 

(iii) A is closed under increasing countable unions; that is, if £„ € A for all n , and 
if £ n C £„+i for all n , then £„ e A. 

47. [0, R} and ^(R) are both a-algebras, and {0, R} C A C V( R) holds for any 
other a -algebra of subsets of R. 

>48. Let £ be any collection of subsets of R. Show that there is always a smallest 
a -algebra A containing £. [Hint: Show that the intersection of or -algebras is again a 
a -algebra.] 

> 49. The smallest o -algebra containing £ is called the a -algebra generated by £ 
and is denoted by <r(£). If £ C T, prove that cr(£) C 

50. Prove that A — {£ C R : either £ or £ c is countable) is a <r -algebra; in fact, 
A is the a -algebra generated by the singletons. 
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51. Let A = {£cR: either E or E c is finite}. Is A an algebra? Is A a a -algebra? 
Explain. 

52. Show that ^ = |£cR: either m (£) = 0 or m (E c ) = 0} is a ct - algebra; in 
fact, A is the a -algebra generated by the null sets. 

The Borel a -algebra B is defined to be the smallest or -algebra of subsets of R 
containing the open sets; equivalently, B is the a -algebra generated by the (open) 
intervals (see Exercise 53). The elements of £ are called the Borel sets. Notice that 
closed sets, Gj-sets, F^-sets, G& a -sets, and so on, are all Borel sets. From Corollaries 
16.17 and 16.20, every Borel set is measurable; that is, B C M. 

> 53. Show that B is generated by each of the following: 

(i) The open intervals E\ = {(«, b) : a < b). 

(ii) The closed intervals £2 = [[a,b\ : a < b). 

(Ill) The half-open intervals £3 = {(a, b ], [a, b) : a < b}. 

(iv) The open rays £4 = {(a, 00 ), (— 00 , b) : a,b € R}. 

(v) The closed rays £5 = {[a, 00 ), (— 00 , b] : a,b € R}. 

[Hint: It is easy to see that B = o {£ \ ). In each of the remaining cases, you just need 
to show that £\ C o(£,) for i = 2, 3, 4, 5. Why?] 

54. Prove that the collection of all open subsets of R has cardinality c. What is the 
cardinality of the collection of all Gs subsets of R? 


The Structure of Measurable Sets 

At this point we know that the collection M of measurable sets is a o -algebra containing 
the open sets, and hence all of the Borel sets B, and we know that Lebesgue measure 
m, the restriction of Lebesgue outer measure m* to M, is countably additive on M. 
Moreover, we know that m * , and hence also m , is completely determined by its values on 
open sets. In this section, we will pursue this last observation still further and, in so doing, 
arrive at a connection between the Borel sets B and the Lebesgue measurable sets M. 

To begin, we note that a Lebesgue measurable set differs from a Borel set by a set 
of measure zero. 

Theorem 16.21. For a subset E of R, the following are equivalent: 

(i) E is measurable. 

(ii) For every e > 0, there exists an open set G D E such that m*(G \E) < e. 

(iii) For every e > 0, there exists a closed set F c E such that m*(E \ F) < e. 

(iv) E = G \ N, where G is a Gs-set and N is a null set. 

(v) E = F U N, where F is an F„-set and N is a null set. 

proof. If E is measurable, then certainly both (ii) and (iii) hold. Also, since null 
sets and Borel sets are measurable, either (iv) or (v) implies that E is measurable. 
Thus, it is enough to show that (ii) implies (iv) and that (iii) implies (v). 
(Why?) 
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So, suppose that (ii) holds. Then, for each n, there is an open set G n such 
that E c G n and m*(G n \ E) < 1 /n. Let G = G n . Clearly, G is a G$- set; 
moreover, G \ E is a null set because it is contained in G n \E and so has measure 
at most l/n for any n. That is, (iv) holds. The proof that (iii) implies (v) is very 
similar. □ 

Corollary 16.22. Ifm(E) = 0, then E is contained in a Borel set G with m(G) = 0. 

The conclusion to be drawn here is this: A Lebesgue measurable set is a Borel set 
plus (or minus) a subset of a Borel set of measure zero. While a subset of a Borel set 
need not be a Borel set (as we will see later), a subset of a null set is always a null 
set. Thus there are more measurable sets than Borel sets. In fact, it can be shown, by 
using transfinite induction, that the Borel cr-algebra B has cardinality c while, as we 
have seen, the Lebesgue a -algebra M has cardinality 2 C . 

The Lebesgue measurable sets are said to be complete because every subset of a 
null set is again measurable. In fact, the Lebesgue measurable sets are the completion 
of the Borel sets (see Exercises 56 and 57). 


EXERCISES 

55. Complete the proof of Theorem 16.21. 

56. Given a a -algebra A of subsets of R, let 

A={EUN:E e A and Nc Fe A with m(F) = 0}. 

A is called the completion of A (with respect to m). Show that„4 is a cr-algebra. 
[Hint: First show that.4 is an algebra.] 

57. Prove Corollary 16.22, thus showing that AA = B, the completion of the Borel 
<7- algebra. 

> 58. Suppose that m*(E) < oo. Prove that E is measurable if and only if, for every 
8 > 0, there is a finite union of bounded intervals A such that m*(EAA) < e (where 
EAA is the symmetric difference of E and A). 

> 59. If E is a Borel set, show that E + x and rE are Borel sets for any x, r € R. 
[Hint: Show, for example, that A = [E : E + x e B] is a a -algebra containing the 
intervals.] 

> 60. If E is a measurable set, show that E + x and rE are measurable for any x, 
r € R. [Hint: Use Theorem 16.21.] 


Our next result should be viewed as a continuity property of Lebesgue measure. 

Theorem 16.23. Let (E„) be a sequence of measurable sets . 

(i) IfE n c E n+X for each n, thenm(\JfL x E n ) == lim^oo m(E n ). 

(ii) IfE n D E n +\ for each n, and if some £* hasm(Ek) < oo, thenm(fffL { E n )= 
lim n _ >00 m(E n ). 



The Structure of Measurable Sets 


285 


proof. Please note that, in either case, Ujjlj E n and f)%L\ E n are measurable. 
The “trick” in each case is to manufacture a disjoint union of sets and appeal to 
the countable additivity of m. 

First, suppose that E n c E n +\ for each n. Then, m(E n ) < m(E n +\) for all n, and 
hence lim n _,oo m(E n ) = sup n m(E n ) exists and is at most m{ \J™ ={ E n ). Of course, 
if some E n has infinite measure, then so does E n \ thus, we may assume that 
each E n has finite measure. Next, notice that 


U En = El U U(£„ +1 \ E n ), 

n=l n = 1 

and hence, since m(E n ) < oo for all n, we get 


m 


CQ e ”) = 


m(Ei) + ]T] ™{E n +\ \E„) 


= m{E\) + ^2 - m(£„)] 


n = 1 

= lim m(E „+ ,). 

rt^oo 


Next, suppose that E n D E n +\ for each n. Then, m(E n ) > m(E„+\) for all n 
and, again, lim „_►<:» m(E„) = inf„ m ( E„ ) exists and is at least m( p|~ , E„). Now, 
if some E k has finite measure, then, by relabeling, we may simply suppose that 
E\ has finite measure. (Why does this work?) Then, since 


OO OO 

Ei \ 0 E n = IM \ £ «+')> 

n — 1 n=\ 


we have 


m(E\) - m ^fj E^j = m \ fj E n ^j 

OO 

= \ £ "+i) 


«=i 

00 


= J2 - «<£,+ o] 

n=l 

= m(E\) — lim m(E n ). 


Hence, m(fXli E n ) = lim*^ m(£ n ). □ 


If we think of M as a lattice, where A < B means that A c B, then E n is the 
same as sup n E n for an increasing sequence of sets ( E n ). Likewise, P|£L, E n is the same 
as inf„ E n for a decreasing sequence of sets ( E n ). Thus, the conclusion of the theorem is 
that m( sup n E n ) = sup„ m{E n ) for an increasing sequence of measurable sets ( E n ) and 
m(inf M E n ) = inf„ m(E n ) for a decreasing sequence of measurable sets (Zs„), provided 
that inf n m(E n ) is finite. From this point of view, Theorem 16.23 is a continuity result. 

In particular, notice that if (E n ) decreases to the empty set 0, and if some E k has finite 
measure, then m(E n ) decreases to 0. This says that m is “continuous at 0” as a function 
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on M (for more details, see Exercise 66). Also, note that if E is any measurable set, 
then m(E) = lim„_ 00 /n(£ D [— n. n ]). If, in addition, m(E) < oo, then we could also 
write lim„_ 00 m(E \ [— n, n ]) = 0. 

As a corollary to Theorem 16.23, we have the Borel-Cantelli lemma. 

Corollary 16.24. If each E„ is measurable, and if m (E») < oo, then 

e OO \ 

U Ek ) = m 

k—n J 

Corollary 16.25. For any set E c R, we have 

m*(E) = inf{m(G) : E C G and G is open). 

If E is measurable , then we also have 

m(E) = sup {m(/0 : K C E and K is compact}. 

proof. The first formula follows from Corollary 16.6. For the second, suppose 
that E is measurable. For each n , choose a compact set K n c E n [— n, n ] such 
that m(K„) > m(E n [— n, n ]) — \/n. Since m(E fl [— n, n ]) increases to m(£), it 
follows that 



m(E) > sup [m(K) : K C E and K is compact} 

> sup m(K n ) 

n 

> limsup m(K n ) = m(E). □ 

n-+ oo 

Our continuity result also allows us to “fine tune” the characterization of measurable 
sets given by Theorem 16.22 in the case of sets with finite outer measure (or bounded 
sets). 

Corollary 16.26. Suppose that m*(E) < oo. Then, E is measurable if and only 
if for every e > 0, there exists a compact set F c E such that m(F) > m*(E) — e . 


EXERCISES 

61. Find a sequence of measurable sets ( E n ) that decrease to 0 , but with m(E n ) = 
oo for all n. 

62. If E„ is measurable for each n , show that m(liminf, l _> 00 E„) < liminf^^oo 
m(E n ) and also that m(limsup n _ +00 E n ) > limsup n _ i>00 m(E rt ), provided that 
m( (J£U E n ) < oo for some k > 1 . 

> 63. Prove Corollary 16.24. 

> 64. Prove Corollary 16.26. 

65. Let M\ denote the measurable subsets of [0, 1 ]. Given £, F e M\ y define 
E ~ F if m(EAF) = 0. Prove that ^ is an equivalence relation. 
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66. In the notation of Exercise 65, define d(E , F ) = m(EAF) for £, F € M\. 
Prove that d defines a pseudometric on M i . (That is, d induces a metric on M i /~, 
the set of equivalence classes under equality a.e.) 

67. In the notation of Exercise 65, show that m is continuous as a function on 
(M \,d). [Hint: Since m is additive, you only need to check continuity at one point; 
0 is a convenient choice.] 

68. Prove that (M \ , d) is complete. [Hint: If ( E n ) is d-Cauchy, then, by passing 
to a subsequence, you may assume that d ( E n , E n +\) < 2~ n . Now argue that ( E n ) 
converges to, say, lim sup^^^ E n .] 


For our final topic in this section, we further demonstrate the interplay between 
Lebesgue measure and the topology of R by presenting an important result concerning 
coverings by families of intervals. 

We say that a collection C of closed, nontrivial intervals in R forms a Vitali cover 
for a subset E of R if, for any x e E and any e > 0, there is an interval I e C with 
x e I and rti(I) < e . In other words, C is a Vitali cover for E if, for every e > 0, 

E C (J {/ : / € C and m(7) < s). 

In particular, notice that if C is a Vitali cover for E, then so is the collection 

{/ eC:m(/)<£} 

for any (fixed) e > 0. Loosely speaking, the intervals in C form a neighborhood base 
for the points in E; that is, given a point x e E and any open set U containing x , we 
can always find an interval I from C with x e I C U. (How?) 

Vitali’s Covering Theorem 16.27. Let E be a set of finite outer measure , and 
let C be a Vitali cover for E. Then , there exist countably many pairwise disjoint 
intervals (/„) in C such that 



proof. We can simplify things a bit by making two observations: First, since 
m*(E) < oo, there is an open set U containing E with m(U) < oo. Next, given 
x e E c U and e > 0, there is an interval / € C such that x e I c U and 
m(I) < e. Thus, the collection {/ e C : / c U) is still a Vitali cover for E. Since 
it is enough to prove the theorem for this collection, we may simply suppose that 
each element of C is already contained in U. 

To begin, choose any interval /] in C. If m(E \ I\) = 0, we are done; other- 
wise, we continue to choose intervals from C according to the following scheme: 
Suppose that pairwise disjoint, closed intervals I\ , . . . , l n have been constructed 
with m* ( E \ (JIUi h) > 0. We want to choose /„+j so that it is the “next biggest” 
interval in C that is disjoint from 7i , . . . , /„. To accomplish this, consider the 
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intervals in C that are completely contained in the open set 

n 

G n = U\ IJ/*- 

*=i 

Since £ \ |J*=i h # 0, and since C is a Vitali cover for £, such intervals exist; 
notice that any such interval J will also satisfy 0 < m(J) < m(U) (since the 
intervals in C are nontrivial). Setting 

k„ = sup(m(7) : J e C and J c G„), 

it is clear that 0 < k„ < oo. We now choose /„+ 1 e C with m(I n+i ) > k„/2 

and /„+ 1 C G„ = U \ (J*=! h- Obviously, /„+i is disjoint from I\ /„. If 

m i^E \ UJ+' l^j = 0, the construction terminates and the theorem is proved; 
otherwise we continue, choosing /„+ 2 , and so on. 

If our construction does not terminate in finitely many steps, then it yields a 
sequence (/*) of pairwise disjoint intervals in C with /* c U and, of course, 

££1, m(/*) < m(U) < 00 . It only remains to show that m(E\ (J*l| h) = 0. To 
this end, first notice that each J € C must hit some /„ . Indeed, if J n (U^ = , I k ) = 0 
for all n, then we would have m(J) < k n < 2m(l n+] ) -* 0 (as n -* 00 ), which 
contradicts the fact that m(J) > 0. 

Finally, let e > 0 and choose N so that Y1T=n+\ m (h) < £ - Given x e E\ 
h C Gn, choose an interval J e C with x e J and J n (ur=i ! k) = By 
our observation above, we know that there is a smallest n such that 7 n ^ 0. 
Necessarily, n > N and m(J ) < 2 m(I„). (Why?) Thus, if we let J„ be the closed 
interval having the same midpoint as /„ but with radius five times that of /„, that 
is, with m(J n ) = 5m(/„), then it is easy to see that J C J„. (Why?) In other words, 
what we have shown is that 

00 yv 00 

E\\Jl k C E\{Jl k C U Jk, 

k=\ k=\ k=N + 1 

and so 

00 \ / N \ 00 

E \ U /* ) < ( E \ U /* ) < Y, m(Jk) 

k=\ / \ k=\ J it=/V+l 

oc 

= 5 ^ m(lk) < 5s. 

k=N + 1 

Since e is arbitrary, we get m (E \ (J*li A) = 0. □ 

Corollary 16.28. Let E be a set of finite outer measure , and let C be a Vitali cover 

for E. Given e > 0, there are finitely many pairwise disjoint intervals I\ /„ 

in C such that 

£\U/*) <*. 

k=\ / 
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Corollary 16.29. An arbitrary union of intervals is measurable. That is, if (I a )ot€A 
is any collection of intervals in R, then the set E = [J aeA I a is measurable. 


EXERCISES 

69. Let £ be a set of finite outer measure, and suppose that for some sequence of 

intervals (/„) we have m (£ \ (J^l, In) = 0- Show that m*(£) < m(/„). 

70. Prove Corollary 16.29. [Hint: Let C be the collection of all closed intervals J 
such that J <Z I a for some a.) 


A Nonmeasurable Set 

Well, now for the bad news: There exist nonmeasurable sets. In this section we will 
present an example due to Vitali, dating back to 1905. You may find it easier to follow 
the example if you first know where it comes from. We identify the interval [0, 1) with 
the unit circle in C (or in R 2 ) under the map: x 2nx e llux (or (cos 2nx, sin 2nx)). 
That is, [0, 1) is identified with [ 0, 2rr), and then [ 0, 2 tt ) is wrapped around the circle, 
in the usual way, by identifying each angle in [ 0, 2n) with the point it determines on 
the circle (see Figure 16.3). 



Under this identification, the addition of angles corresponds to addition (mod 1). 
Specifically, given x, y e [0, 1), we define 


x + y (mod 1 ) = 


x + y, 
x + y - 1 , 


if x + y < 1 
if x 4- y > 1 . 


Given a subset E of [ 0, 1 ), we also define the translate of £ under addition (mod 1 ) by 


£ + x (mod 1 ) = [a + x (mod 1) : a e £}. 


In this way, translation by x (mod 1 ) in the interval [0,1) corresponds to rotation through 
an angle 2jrjt on the circle (see Figure 16.4). 

It is easy to see that addition (mod 1) is reasonably well behaved; for example, 
x + y (mod 1 ) = y + x (mod 1 ). Better still, Lebesgue measure is invariant under 
translation (mod 1). 


Lemma 16 JO. Let E c [ 0, 1 ) and x € [ 0, 1 ). If E is measurable, then so is 
E + x (mod 1). Moreover, in this case, m(E + x (mod 1)) = m(£). 
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E = E x u E 2 E 2 +(x-l) 



proof. Put E\ = E n [0, 1 — x) and E 2 = E \ E\ = E n f 1 — jc, 1). Clearly, 

£i and E 2 are measurable and disjoint, and so m(E) == m(£j) + m{E 2 ). Now it is 
easy to check that 

E 4- x (mod 1) = [E\ 4- * (mod 1)] U [£ 2 + x (mod 1)] 

= [E,+x]u [£ 2 + (jt-l)], 

where the last two sets are ordinary translates. What’s more, these last two sets 
are measurable (see Exercise 60) and disjoint, so E -f x (mod 1) is measurable. 
Also, by translation invariance, 

m(E 4- x (mod 1)) = m(E\ -f jc) -f- m(E 2 -f (x — 1)) 

= m{E\) -f m(E 2 ) = m(E). □ 

We have introduced arithmetic (mod 1 ) so that we may consider a curious equivalence 
relation on [0, 1). Namely, given jc, y e [0, 1), we define 

x ~ y <=> x — y e Q <$=> ye Q + x (mod 1). 

This equivalence relation partitions [0, 1) into disjoint equivalence classes [x]^ = 
Q 4- x (mod 1). That is, [0, 1) is the disjoint union of the distinct cosets of Q under 
addition (mod 1). Since each of the sets Q + x (mod 1) is countable, there are evidently 
uncountably many distinct equivalence classes. 

We next call on the Axiom of Choice to choose a full set N of distinct coset repre- 
sentatives for our equivalence relation. That is, N contains precisely one element from 
each equivalence class and no more. Thus, given any x e [ 0, 1), there is a unique y e N 
such that jc ~ y. Moreover, for jc, y e N, we have jc ~ y <==> x — y. Please note that 
N is necessarily an uncountable set. 

The idea here is that we now reverse the process described above and write [ 0, 1) as 
a union of cosets, or translates (mod 1) of N. Indeed, if, for each rational r e Qn [ 0, 1), 
we set N r — N 4- r (mod 1), then 

[0, 1) = (J N r and N r flJV t =0 for r # s. 

reQn [0,1) 

The first claim is easy: Given jc e [0, 1), we know that jc ~ y for some y e N, and 
hence jc = y + r (mod l)forsom er e Qn[0, 1); that is, jc e N r forsomer e Qn[0, 1). 
The other containment is obvious since N r c [ 0, 1) for any r e Q n [ 0, 1). Next, to see 
that the N r are pairwise disjoint, note that if jc e N r n N s , then we would have 

y -h r (mod 1) = jc — z + s (mod 1), 
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for some y, z e N and some r, s e Q n [ 0, 1). But then, y — z e Q; that is, 

y ~ z => y = z => r = 5, 

since 0 < r, j < 1. Thus, either N r = N s (for r = s) or N r D N s = 0 (for r ^ 5 ). 
Finally, putting all of these observations to work, we have 

Theorem 16.31. N is nonmeasurable. 

proof. If N were measurable, then all of the N r would be measurable too, by 
Lemma 16.30. Moreover, we would have m(N r ) = m(N) for all r. Consequently, 

1 = m([0, 1)) - mi (J AU = y^m{N r ) = ^m(A/). 

V<=Qn[0,l) / reQn[0. 1) r€Qn[0,l) 

Oops! We cannot assign any value at all to m(N) without arriving at a contradic- 
tion! Thus, N is nonmeasurable. □ 

Notice that by repeating the argument above, using m* and countable subadditivity 
in place of m and countable additivity, we must have 0 < m*(N) < 1. (Why?) That is, 
we now have our example showing that m* is not countably additive on all of 'P(R). 

Corollary 16.32. There exists a sequence of pairwise disjoint subsets (E n ) of 
[0, 1) with m* (IX, E n ) < i m\E n ). 

This construction of a nonmeasurable set used only the countable additivity and the 
translation invariance of Lebesgue measure, and so we have actually proved something 
more. 

Theorem 16.33. Suppose that A is a o -algebra of subsets of[ 0, 1), and that 
p : A -» [0, oo] countably additive and translation-invariant. If N e A, 
then we must have either /x([0, 1)) = 0 or /x([0, 1)) = oo. In other words , if 
/x([0, 1)) = 1, then N £ A and hence A ^ P([0, 1)). 


EXERCISES 

71. Prove Corollary 16.32. 

72. Find a decreasing sequence of sets E\ D E 2 D • • • , such that m*(E\) < 00 
and m (f^li En) < lim^oo m*(E n ). 

> 73. If E is a measurable subset of the nonmeasurable set N (constructed in this 
section), prove that m(E) = 0. [Hint: Consider E r = E + r (mod 1), for r € 
Qn[0,i).] 

> 74. If m*(A) > 0, show that A contains a nonmeasurable set. [Hint: We must 
have m*(A fl[n,n + l)) > 0 for some n e Z, and so we may suppose that 
A C [ 0, 1). (How?) It follows from Exercise 73 that one of the sets E r = AH N r is 
nonmeasurable. (Why?)] 
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75. Measurable sets aren’t necessarily preserved by continuous maps, not even sets 
of measure zero. Here’s an old example: Recall that the Cantor function / : [0, 1] -* 
[0, 1 ] maps the Cantor set A onto [0, 1 ]. That is, the Cantor function takes a set 
of measure zero and “spreads it out” to a set of measure one. Conclude that / maps 
some measurable set onto a nonmeasurable set. 


Other Definitions 

There are several popular approaches to defining Lebesgue measurable sets. The ap- 
proach that we have adopted takes full advantage of the topology of the real line, along 
with certain intrinsic properties of outer measure m*, to arrive at the notion of a mea- 
surable set. The disadvantage to this approach is that it is hard to generalize to the case 
of an “abstract” measure. For this reason, many authors prefer a different approach, one 
that was first suggested by Caratheodory. In this section we will give a brief overview 
of Caratheodory’s definition. 

To begin, let’s recall Lebesgue’s original definition: Given a subset E of [a t b] 9 
Lebesgue would say that E measurable if 

b — a = m*(E) -f m*([a, b]\ E). 

Lebesgue’s definition extends to unbounded sets E using the same observation that we 
used earlier: It is enough to know that E n [ a , b ] is measurable for any bounded interval 
[ a, b ]. Thus, we could rephrase the requirement as 

m*([a, b]) = ^([fl,l)]n£)+^([fl,^]n E c ) 

for every interval [a, b ]. Written this way, the requirement for measurability is that E 
and E c should split every interval into two pieces whose outer measures add up to be 
the full measure of the interval. Caratheodory ’s idea is to replace intervals by arbitrary 
subsets of E. That is, Caratheodory calls a set E measurable if 

m*(A) = m*(A n E) + m\A fl E c ) (16.1) 

for every subset A of E. In other words, a measurable set is required to split every set 
“nicely.” 

Now Caratheodory’s requirement is stronger than Lebesgue’s, and hence a set that 
is measurable by Caratheodory’s standard is measurable by Lebesgue’s (and, hence, 
by ours too). It may seem surprising that the two definitions are actually equivalent - 
at least until you recall that outer measure is completely determined by its values on 
intervals. 

The hard work in using Caratheodory’s definition is cut in half by two simple ob- 
servations: For one, it is only necessary to test 

m*(A) > m\A fl E) + m\A n E c \ 

since countable subadditivity always gives the other inequality. For another, it is now 
clear that we only have to consider sets A with m*(A) < oo. (Why?) From here, we 
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would start down the same road that we traveled earlier: We would check that this 
definition yields an algebra of measurable sets (this is the easy part) and, in fact, a 
a -algebra of sets (and this is where the real fighting takes place). Ultimately, we would 
arrive at the same conclusion: Measurable sets are Borel sets plus or minus null sets. 
In any case, using the machinery of Theorem 1 6.20, it is a simple matter to check that 
Carathdodory’s notion of measurability coincides with our own. 

Theorem 1634. Let E C R. Then, E is measurable if and only if m*( A) = 
m*(A fl E) + m*(A n E c ) for every subset A of R. 

proof. First suppose that E is measurable. Given A, choose a G«-set G containing 
A such that m*(A) = m(G). (How?) Then, since both E and G are measurable, 

m*(A) = m(G) = m(G n £) + m(G n E c ) 

> m*(A n E) + m*(A fl E c ). 


Hence, equation (16.1) holds. 

Next, suppose that m*{A) = m'(A n E) + m'(A O E c ) for every subset A of R. 
If m*(E) < oo, choose a G^-set G containing E such that m*(£) = m(G). Then 
(putting A = G in equation (16.1)), 

m(G) = m*(G n E) + m*(G n E c ) = m*(£) + m*(G \ £). 

Hence, m*(G \ £) = 0 and, in particular, G \ £ is measurable. It follows that 
£ = G \ (G \ £) is measurable, too. If m*(£) = oo, we apply the first part of 
this argument to each of the sets £„ = £ n [-«, n ], where n e N. For each n, we 
choose a Gj-set G„ containing £„ with m*(G„ \ £„) = 0. Then, £ is contained 
in the measurable set G = U^, and m*(G \ £) < Y,T = , '”*(G„ \ £) = 0. As 
before, it follows that £ is measurable. □ 


EXERCISES 


76. If m*(E) = 0, check that £ satisfies Carathtkxiory’s condition (16.1). 

77. If both E and F satisfy Carathdodory’s condition, prove that £ U F, E fl F, 
and £ \ F do too. [Hint: It is only necessary to check £ U F. (Why?) For this, use 
the fact that A H (£ U F) = (A fl £) U (A n E c n £).] 

78. If £ is a measurable subset of A, show that m*(A) = m(£) + m*(A \ £). 
Thus, m*(A \ £) = m*(A) — m (£) provided that m (£) < oo. 

0 


Notes and Remarks 

The passage quoted at the beginning of the chapter is taken from Grattan-Guinness 
[1970]. 
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The interchange of limits and integrals, as in the formula f* /„, 

can be handled successfully by using the Riemann integral in several important cases. 
However, the proofs of such convergence theorems are typically rather difficult. For 
more details, see Eberlein [ 1957], Kestelman [ 1 970], Lewin [ 1 986], Luxemburg [1971], 
and Riesz [1917]. 

Lebesgue’s thesis [1902] was based on a series of five short papers, or research 
announcements, published between the years 1899 and 1901. During the academic 
year 1902-3, Lebesgue gave the Course Peccot at the College de France; these lectures 
were published in 1904 in Borel’s monograph series as Legons sur l’ Integration et la 
Recerche des Fonctions Primitives. The second edition of the Legons appeared in 1928 
and included several important new results; see Lebesgue [1928]. Lebesgue’s Legons 
continues to be an important work. A substantial portion of the notes are devoted to 
the history of the development of the integral before Lebesgue - over 100 pages in the 
second edition. 

Lebesgue was a prolific expository writer, too. He published several essays on the 
teaching of mathematics and several expository articles describing his own work. Two 
of the latter, “Sur le developpement de la notion d’ integrate” and “Sur la mesure des 
grandeurs,” have been translated into English by Kenneth May and appear, along with a 
short biographical essay, in Lebesgue [1966]. Other expository articles of interest here 
include Ulam [1943] and Riesz [1920, 1949]. 

For more on the history of the development of the Lebesgue integral see Hawkins 
[ 1970], Hobson [1927, Vol. I], Bliss [1917], and Hildebrandt [1917]. 

During the 1920s, the newly formed Polish school of mathematicians, headed by 
Wadaw Sierpinski, went a long way toward resolving the various questions associated 
with the problem of measure. Indeed, the early volumes of Fundamenta Mathematicae 
contain dozens of important papers on measure theory, analysis, and the foundations 
of topology and descriptive set theory. Of particular interest here are Banach [1923], 
Banach and Kuratowski [1930], and Banach and Tarski [1924], For more on the history 
of this important journal, see Kuzawa [1970]. 

For a down-to-earth discussion of the Banach-Tarski paradox, see French [1988]. 
A detailed proof of the Banach-Tarski theorem in R 3 is given in Stromberg [1977]. As 
Stromberg points out, an excellent paper related to extensions of Lebesgue measure is 
Bruckner and Ceder [1974], 

Many of the results in this chapter have been adapted from, or at least influenced 
by, de La Valine Poussin [1934] and Oxtoby [1971]. It would seem that de La Valine 
Poussin was the first to define a measurable set as one that could be well approximated, 
in terms of outer measure, by open sets and closed sets (although Theorem 16.21 
(iii) and (iv) were known to Lebesgue). This approach has the distinct pedagogical 
advantage of being “hands on”; that is, most of what we need to know can be deduced 
from first principles without appealing to unintuitive definitions or to the “sleight of 
hand” of a -algebra arguments. As a matter of curiosity, and some small nuisance, de 
La Vall£e Poussin is a difficult name to track down in most library catalogs; you may 
find selections under any of the four initial letters “D,” “L,” “V,” or “P.” According to 
Burkill [1964], the most appropriate choice here is “L.” 
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The “measure” described in Exercise 3, called the outer content of a set, was intro- 
duced by Peano [1887] and later by Jordan [1892]. 

Theorem 16.10 is due to Lebesgue [1902], but Guiseppe Vitali [1904] and W. H. 
Young [1904] independently discovered the theorem at about the same time. This 
discovery led Vitali and Young to develop their own theories of measure, which closely 
mirrored Lebesgue’s; see Hawkins [1970]. The proof of Theorem 16.10 given here 
is based on the presentation in Oxtoby [1971]. For a proof requiring only advanced 
calculus, see Botsko [1988]. 

Exercise 19 is based on the discussion in Riesz and Sz.-Nagy [1955]. Exercises 27 
and 37 are based on the discussion in Wilansky [1953a], but see also Wilansky [1953b] 
and Rudin [1983]. Exercise 42 is cribbed from W. B. Johnson’s lectures on real analysis 
given at The Ohio State University in 1974-75. 

A ring of sets 1Z is a collection of subsets of a fixed set X that is closed under 
differences and finite unions. It is easy to see that if the ring H contains X itself, then 
7 Z is an algebra of sets. For a short proof that a ring of sets actually is a ring (in the 
algebraic sense), see Wilker [1982]. 

The so-called Steinhaus lemma, Exercise 44, is from Steinhaus [1920] and appears in 
the first volume of Fundamenta. The elegant proof outlined here is from the Annexe of 
the same volume. Please compare this result with the observation made back in Chapter 
Two, also due to Steinhaus, that A — A = [— 1, 1 ]. For variations on this theme, along 
with a few applications, see Chae [1980]. Still more variations and extensions are given 
in Oxtoby [1971] and Kominek [1983]. 

Theorem 16.27 is due to Vitali [1905a]. The proof presented here is due to Banach 
[1924] by way of Natanson [1955]. 

According to most sources, the first, and simplest, construction of a nonmeasurable 
set, presented here as Theorem 16.31, was given by Vitali [1905a]. See Van Vleck 
[1908] for a similar construction. Thomas [1985] provides an unusual graph-theoretic 
construction of a nonmeasurable set. Other, less elementary constructions are given in 
Oxtoby [1971]. Theorem 16.33 is from Folland [1984]. See also Mauldin [1979] and 
Briggs and Schaffter [1979]. 

The definition of measurability given in the last section, along with Theorem 16.34, 
is due to Constantin Caratheodory, who was among the first to develop a general theory 
of “abstract” measures; see Caratheodory [1918]. 
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Measurable Functions 

Recall from our discussion in Chapter Sixteen that Lebesgue’s approach to the integral 
applies to functions / for which the sets {x : a < f(x) < b } are measurable for every 
a < b. In this chapter we will pursue this notion (and then some). What we will find 
is that such functions are “almost” continuous, but in a somewhat weaker sense than 
was the case for Riemann integrable functions. This is as it should be, since we expect 
the class of Lebesgue integrable functions to be larger than the class of Riemann 
integrable functions. 

Given a function f:D -> R, defined on some domain D, we say that / is (Lebesgue) 
measurable if D is measurable and if, for each real or, the set 


{/ > a] = {x e D : f(x) > a] = / 1 ((or, oo)) 


is measurable. In particular, notice that if D is a null set, then every function / : D R 
is measurable. 

The requirement that D be measurable is actually redundant, since 

e \ 00 oo 

(-», 00) = U /-' ((— n, oo)) = (J {/ > -«}, 

/ n=l n=l 

but there are nevertheless good reasons for repeating this requirement. 

As you might expect, we want the collection of measurable functions to be a vector 
space, an algebra, and so on. Most of these properties will follow easily from what we 
know about measurable sets (the fact that M is a o -algebra, for example). Before we 
start on this project, though, let’s first note that we could use any one of several similar 
definitions for the measurability of functions. 

Proposition 17.1. Let f : D -+ R, where D is measurable. Then , / is measurable 
if and only if any one of the following holds: 

(i) {/ > a] is measurable for all real a; 

(ii) [f < a) is measurable for all real a; 

(iii) [f < a] is measurable for all real a. 
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proof. First suppose that / is measurable. Then, 

(/><*! = /-'([or. oo)) = /“' ( Pi ~ i’ 00 ) 

= nrV-i.oc)) 

k=l 

-ni/>-n. 

which is measurable. Thus, (i) holds. 

Now, that (i) implies (ii) is obvious, since {/ < or} = D \ {/ > a) e M. 
That (ii) implies (iii) follows the same lines as our first observation; in this case, 

{/ < a} = a 0 !, (/ < a + (1/A:)}. Finally, that (iii) implies / is measurable is 
obvious, since {/ > a] = D \ [f < or}. □ 

Now if / is measurable, it is easy to see that the set {/ = a] is measurable for every 
real a; but this condition alone is not sufficient to ensure measurability (see Exercise 5). 
Instead, notice that if / is measurable, then the set [a < f < b) is measurable for any 
a < b. In fact, we can use this to manufacture another equivalent formulation to include 
in Proposition 17.1: / is measurable if and only if the set {a < f < b) is measurable 
for any pair of real numbers a < b. But why stop there? 

Corollary 17.2. Let / : D -> R, where D is measurable. Then , / is measurable 
if and only if f~ l (U) is measurable for every open set U C R. 

The class of functions that give relatively “nice” sets as inverse images of open sets 
is quite large, as we will see. In fact, there are several familiar classes of functions that 
are easily seen to be measurable. 

Corollary 173. Continuous functions , monotone functions , step functions , and 
semicontinuous functions ( all defined on some interval in R) are measurable. 


EXERCISES 

> 1. Prove Corollary 17.2. 

2. Prove Corollary 17.3. In which cases, if any, is it necessary to assume that the 
domain D is an interval? 

t> 3. Let / : D R, where D is measurable. Show that / is measurable if and only if 
the function g : R — ► R is measurable, where g(jc) = f(x) for x € D and g(x) = 0 
forjc £ D. 

> 4. Prove that X £ is measurable if and only if E is measurable. 

5. Let N be a nonmeasurable subset of (0, 1 ), and let f(x) = x • X /vr(jc). Show that 
/ is nonmeasurable, but that each of the sets {/ = or} is measurable. 

6. Suppose that / : D -> R, where D is measurable. Show that / is measurable 
if and only if {/ > a} is measurable for each rational a. 
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7. If / : D — ► R is measurable and g : R — * R is continuous, show that g o f is 
measurable. 

8. Suppose that D = AUB, where A and B are measurable. Show that / : D -*■ R 
is measurable if and only if /U and /|g are measurable (relative to their respective 
domains A and B, of course). 


With just a bit more work, we can improve on Corollary 17.3 and, at the same time, 
confirm a conjecture that is implicit in our discussions of Lebesgue integration. 

Theorem 17.4. If f : \a,b] — ► R is a Riemann integrable function, then f is 
Lebesgue measurable. 

proof. Recall that D(f), the set of points of discontinuity of /, is a Borel set, and 
so is measurable. The same is true of C(f) = [ a, b ] \ D(f), the set of points where 
/ is continuous. What’s more, if / is Riemann integrable, then m(D(/)) = 0, 
which means that every subset of D(f) is measurable. 

Now, let’s compute the inverse image f~'(U) of an open set U: 

f-\U) = (/-'(£/) n C(/>) u (f~\U) n D(/)). 

The first of these is an open set, relative to C(/); that is, f~'(U) n C(/) = V n 
C(/), where V is open in R. Thus, f~ 1 ((/) n C(/) is even a Borel set. The second 
set, f~'(U) n D(/), is a subset of a set of measure zero, and so is necessarily 
measurable. Consequently, f~'(U) is measurable. □ 

Corollary 17.5. Every function f : ( « , 6 ] — ► R of bounded variation is mea- 
surable. 

Please note that the collection of measurable functions is evidently strictly larger 
than the collection of Riemann integrable functions. Indeed, Xq is measurable (why?), 
but not Riemann integrable. 

We can continue with our “fine tuning” of Corollary 17.2 by introducing another 
level of classification of functions. What this amounts to is simply naming a class of 
functions that is intermediate to continuous functions and measurable functions. 

We say that / : D -*■ R is Borel measurable if D is a Borel set and if, for each real 
a, the set {/ > a} is a Borel set. Equivalently, / is Borel measurable if the set f~'(U) 
is always a Borel set for any open set U. 

Continuous <=> f~ x (open) is open, 

Borel measurable <=> /"' (open) is a Borel set. 

Lebesgue measurable <=> /"' (open) is measurable. 

Clearly, a continuous function is Borel measurable, and a Borel measurable function 
is Lebesgue measurable. It is not hard to see that neither of these statements can be 
reversed: There are Borel measurable functions that are not continuous, and there 
are Lebesgue measurable functions that are not Borel measurable. For example, note 
that monotone functions, step functions, and semicontinuous functions (defined on 
some interval in R) are actually Borel measurable. And, since we know that there 
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are Lebesgue measurable sets that are not Borel set s, there are necessarily Lebesgue 
measurable functions that are not Borel measurable. (Why?) 

Henceforth, if there is no danger of ambiguity, the word “measurable” (with no 
additional quantifiers) will be understood to mean “Lebesgue measurable ” In other 
words, if we are interested in the more restrictive notion of Borel measurability, we 
will specify the extra quantifier “Borel.” 


EXERCISES 

9. Prove that monotone functions are Borel measurable when we take the domain 
D to be an interval. 

10. If / : [ a, b ] — ► R is quasicontinuous, show that / is measurable. Is / Borel 
measurable? 

11. Let G be an open subset of [ 0, 1 ] containing the rationals in [ 0, 1 ] and having 
m(G) < 1 /2. Prove that / = Xg is Borel measurable but is not Riemann integrable 
on [0, 1 ]. Moreover, prove that / cannot be equal a.e. to any Riemann integrable 
function on [0, 1 ]; in other words, / is substantially different from any Riemann 
integrable function. 

12. If / : [ a, b ] -► R is Lipschitz with constant K , and if E C [ a, b ], show that 

m*(f(E)) < K In particular, / maps null sets to null sets. 

13. Iff : [a y b] — ► R is continuous, prove that the following are equivalent, where 
Ec[a,b]: 

(a) m(/(£)) = 0 whenever m(E) = 0. 

(b) / ( E ) is measurable whenever E is measurable. [Hint: Show that / maps F a - sets 
to /v-sets.] 

> 14. If / is measurable and B is a Borel set, show that f~\B ) is measurable. [Hint: 
[A : f~ ] (A) e M] is a o-algebra containing the open sets.] 

> 15. If / is Borel measurable and B is a Borel set, show that f~ ] (B) is a Borel set. 
In particular, this holds for continuous /. 

16. 

(a) If £ is a Borel set, show that E + x and rE are Borel sets. 

(b) If E is measurable, show that E + x and rE arc measurable. 

17. If /, g : R — ► R are Borel measurable, show that / o g is Borel measurable. 
If / is Borel measurable and g is Lebesgue measurable, show that / o g is Lebesgue 
measurable. 

18. Let / : [ 0, 1 ] -► [ 0, 1 ] be the Cantor function, and set g(jt) = /(*) + x. 
Prove that: 

(a) g is a homeomorphism of [ 0, 1 ] onto [ 0, 2 ]. In particular, h = g~ l is continu- 
ous. 

(b) g(A) is measurable and m(g(A)) = 1. In particular, g(A) contains a nonmea- 
surable set A . 

(c) g maps some measurable set onto a nonmeasurable set. 

(d) B = g _, (A) is Lebesgue measurable but not a Borel set. 
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(e) There is a Lebesgue measurable function F and a continuous function G such 
that F o G is not Lebesgue measurable. 


The proof of Theorem 17.4 suggests the following observation: 

Lemma 17.6. If f is measurable, and if g = f a.e., then g is measurable, too. 
Moreover, m({g > or}) = m([f > a)) for all a € R. 

proof. Suppose that / : D -> R and that g : E -*■ R. Then / = g a.e. means 
that 


(/##} = (DAE) U|reDn£: /(*) ^ g(jr)} 

is a null set and hence is measurable. Thus, 

{/ = g) = [x € D n E : f(x) = g(*)} = D \ {f ? g) 

is measurable. And, because {/ / g} is a null set, we also have that E = {/ = 
g} U (E n {/ ?£ g}) is measurable. Finally, 

{g > a) = ({/ > or} \ {/ ^ g)) U ({g > a} n {/ ^ g}) 

is measurable since {/ > a] is measurable and {/ ^ g} is a null set. For these 
same reasons, we get m({g > a}) = m([f > or}). □ 

One of our goals is to characterize the Lebesgue measurable functions in much the 
same way that we did the Lebesgue measurable sets. For example, we will show that 
a Lebesgue measurable function / is almost everywhere equal to a Borel measurable 
function g. Along the way, we will actually show that / is “almost” equal to a con- 
tinuous function. But notice, please, how very different measurable functions are from 
continuous functions: A measurable function may be altered on any set of measure 
zero without sacrificing its measurability, while altering a continuous function at even 
a single point can easily destroy its continuity. At any rate, the premise here is the 
same as before: Lebesgue measurable functions should be well approximated by some 
simpler type of function. This project will take some time, but it will be all the easier 
to complete if we take advantage of the arithmetic of measurable functions. It is about 
time we checked whether the measurable functions form an algebra. 

Theorem 17.7. Let c e R, and let f,g:D->Rbe measurable. Then , each of 
c f> f + S an d fg are measurable. 

proof. The first claim is nearly obvious: 

{cf > a] = [f > a/c}, for c > 0, 

= {/ < a/c for c < 0, 

= D or 0 , for c = 0. 

In any case, the set [cf > a] is measurable. 
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For f + g we use a simple trick: Two real numbers a y b satisfy a > b if and 
only if there is some rational r with a > r > b. Consequently, 

{f + g > a) = [f > a -g) 

= \J{[f>r)n[r> a -g)) 

reQ 

= UO-f > r > n (s > a-r 0- 

reQ 

Since we have written {/ + 5 > a} as a countable union of measurable sets, it is 
measurable too. 

To prove that fg is measurable, we will use a gimmick that we have seen 
before: We will first check that / 2 is measurable: 

1 / 2 > a) = {/ > Ja ) U {/ < — Ja }, if or > 0 , 

= D, if or < 0. 

Thus, f 2 is measurable. It now follows that fg = j[(/ + g) 2 - if - g) 2 ] is 
measurable. □ 

Theorem 17.7 allows us to clarify a few more of the details that are implicit in our 
discussion of Lebesgue integration. In particular, it is now clear that the natural building 
blocks for the Lebesgue integral are measurable functions. 

A simple function is a finite linear combination of characteristic functions of mea- 
surable sets. That is, <p is simple if 

n 

<p =} ajX Ei , o, real, E t measurable. 

1=1 

Clearly, every simple function is measurable. (In truth, what we have actually defined 
here is a measurable simple function - some authors allow for nonmeasurable sets E, 
- but we are only interested in measurable functions, so we will insist on measurable 
sets.) Notice, too, that any step function is a simple function, but not conversely. 

Now there are lots of representations for a given simple function. Indeed, we could 
introduce bogus terms such as 0 • X E , or we might split up a given set and so introduce 
extra terms: X E = X EnA + X E \ A . If we want some measure of uniqueness in the 
representation, we should rephrase our definition slightly. The key here is that a function 

<p is simple if (and only if) it takes on only finitely many, distinct, real values a\, a„ € 

R and if, for each /, the set where each value occurs A, = [tp = a, } is measurable. If 
the a, are distinct, then the A, are pairwise disjoint; thus, 

n 

tp = a, distinct, A, disjoint, measurable. 

1=1 

We will call this representation the standard representation for tp. Notice that in this 
case the sets Ai, . . . , A„ partition R into finitely many, pairwise disjoint, measurable 
sets. 
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EXERCISES 


19 . If /, g : D — ► R are measurable, show that {/ > 5} is measurable. 

20 . Let /„ : D R be measurable for each n , and suppose that /„(jc) < /„+i(jc) 
for each n and each x € D. If /(jc) = lim n ^oo /„(*) exists (in R) for each jc € D, 
prove that / is measurable. Thus, the measurable functions are closed under monotone 
limits. 

21 . Let / be a nonnegative, bounded, measurable function on [a y b] with 0 < / < 
M. Let 




kM „ (* + l)Af 

< f < 

2 " “ 7 2 " 


for each n = 1 , 2 ,..., and k = 0, 1 , . . . , 2", and set 


2 " 


*=0 





Prove that 0 <<p„< (p n +\ < f and that 0 < f — <p„ < 2 "Af foreach/i.Thus, (<p n ) 
is a sequence of simple functions that converges uniformly to / on [a, b]. [Hint: 
Notice that £„,* = E n + U1 k U £ w+ 1 .24+iJ 

22 . Check that the conclusion of Theorem 17.7 still holds (with the same proof) 
if we everywhere replace the word “measurable” by the words “Borel-measurable” 
(and “measurable set” by “Borel set,” of course). 

23 . If / 6 B V [ a y b ], show that / is Borel measurable. 

24 . Does Lemma 17.6 hold for Borel measurable functions? How about if we take 
“a.e.” to mean “except for a Borel set of measure zero”? 


Extended Real- Valued Functions 

We must occasionally consider functions that take on the values ±00, that is, func- 
tions with values in the extended real numbers R = [-00, 00 ]. A good example of 
a situation where infinite values are virtually unavoidable is when considering deriva- 
tives; even relatively tame functions, say monotone functions, can easily have infinite 
derivatives. 

But at least we do not have to alter our definition of measurability. Given / : D -► 
[—00, 00 ], where D is a measurable subset of R, we still say that / is measurable if, 
for each real a, the set 

{/ > a) = {* € D : f(x) > a} = /"'((or, oo]) 

is measurable. Note that if / is measurable, then so are {/ = +00} = p£L|{/ > "1 
and {/ = —00} = D \ {/ > —00} = D \ if > ~ n ))- 1 ° particular, the set where 
/ is finite is measurable: {-00 < / < +00} = D \ ({/ = +00} U {/ = -00}). 



Extended Real-Valued Functions 


303 


Since we have taken the same formal definition for measurability as in the real-valued 
case, the various equivalent definitions given by Lemma 1 7. 1 are still valid for extended 
real-valued functions. In fact, even Corollary 17.2 is still good, provided that we take 
sets of the form (or, + oo ] and [-oo, a) as “neighborhoods of ±oo” (respectively), and 
this is just what we will do. Thus, the open sets in R are open sets in R, together 
with neighborhoods of — oo and +oo and unions of such sets. It follows that the Borel 
subsets of R are Borel sets in R, together with {-oo}, {+oo}, and unions of such 
sets. 

Defining an appropriate arithmetic for extended real-valued functions is problematic: 
We need to define expressions such as oo ± oo and 0 • (±oc). Convention dictates that 

0 • (±oo) = 0 , 

oo • (±oo) = ±00, -oo • (±oo) = =poo, 
oo + oo = oo, -oo — oo = -oo, 

while expressions such as oo - oo and -oo + oo are ambiguous (and should be avoided). 
With some care, however, we can still patch together an amended version of Theorem 
17.7 for extended real-valued functions. We will relegate the details to the exercises. 

In actual practice, the extended real-valued functions that we will encounter will 
be allowed to take infinite values only on sets of measure zero. We say that a measur- 
able function / : D -► [-00,00] is finite almost everywhere if it happens that 
m([\f\ = 00}) = 0 . If / and g are finite a.e., then any ambiguities arising from 
expressions such as / + g occur only on sets of measure zero. This means that we are 
free to define / + g in any way we please in the uncertain cases (see Lemma 17 . 6 ). 
Again, we will leave the details to the exercises. 


EXERCISES 

25 . Suppose that D = A U B, where A and B are measurable. Show that / : 
D — ► [—00, 00 ] is measurable if and only if both f\ A and f\ B are measurable. In 
particular, if D is measurable, then / : D — ► [—00, 00 ] is measurable if and only 
if both of the sets {/ = +00} and {/ = —00} are measurable and f\[\f\<oc) is 
measurable. 

26 . Suppose that /, g : D — ► [—00, 00 ] are measurable. Show that fg is always 
measurable, where we take 0 • (±00) = 0 . 

27 . Suppose that /, g : D— > [—00, 00] are measurable. Show that / + g is 
measurable, provided that we define / (jc)+g(x) to be the same value, say 5 , whenever 
it is of the form 00 — 00 or —00 + 00. 

28 . Suppose that /, g : D — ► [—00, 00] are measurable and finite a.e.; that is, 
m({f = ±00}) = 0 = m({g = ±00}). Show that f + g is measurable no matter 
how it is defined when it has the form 00 — 00 or —00 + 00. 

29 . Let / : [ a, b ] — ► [—00, 00 ] be measurable and finite a.e. Given e > 0 , show 
that there is some finite M such that m({|/| > M}) < e. 
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Sequences of Measurable Functions 

We now know that the collection of measurable functions sharing a common domain 
form a vector space and an algebra of functions. But of course we can’t stop there! We 
want max’s and min’s and absolute values, too. With just a little extra effort, we can 
handle all of these cases, and more, at one and the same time. The key here is that the 
collection of measurable functions is closed under monotone limits, and, as we’ll see, 
this means that the collection is closed under all pointwise limits. 

Throughout this section, unless otherwise specified, we will assume that all functions 
are defined on a common measurable domain D, and that all functions take values in 
the extended real numbers E = (- 00 , -foe ]. 

Theorem 17.8. Let (f„)bea sequence ( finite or infinite) of measurable functions. 

Then , both sup„ f n and inf„ f n are measurable. 

proof. If a € E, then sup„ f n (x) > a means that f n (x) > a for some /?, and 

conversely. That is. 


|sup/„ > or! = M{/„ > or}, 

1 " • n=l 

which is measurable, provided that every f„ is measurable. The argument for 
inf„ f n is easy, too; for example. 


| inf f n > a} = p|{/ n > a}- 

Alternatively, note that inf„ /„ = - sup„ (-/„), and so inf’s are measurable be- 
cause sup’s are. 

The arguments for max{/i, . . . , f„] and min{/i, ...,/„} are essentially the 
same (just take finite unions and intersections). □ 

Corollary 17.9. If f and g are measurable ; then max{/, g), min{/, g}, / + = 
max{/, 0}, f~ = — min{/, 0}, and \ f\ = max{/, -/} = /+ -f /“ are all measur- 
able. 

Since / = / + — /", we actually have something more: 

Corollary 17.10. / is measurable if and only if both / + and f~ are measurable. 

It also follows from Theorem 17.8 that the collection of measurable functions is 
closed under pointwise limits, and this is the best evidence we have that the class of 
measurable functions is quite large, surely larger than any we have seen thus far. 

Corollary 17.11. Let (/„) be a sequence of measurable functions. Then , both 
lim sup^^ f„ and lim inf^^oo f n are measurable. 
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proof. All we have to do is write each in terms of inf’s and sup’s: 

limsup f n = inf ( sup /* ) and liminf /„ = sup ( inf f k I . □ 

n— ►oo n \k> n ) "-* 00 n \*- n / 

Corollary 17.12. If (f„) is a sequence of measurable functions , and if f(x) = 
lim n -» oc f„(x) exists (in R) for all x e D, then f is measurable. In fact, f is 
measurable even if we only have fix) = lim n _oo /„(*) a.e. on D ( regardless of 
how f might be defined otherwise). 


EXERCISES 

> 30. Prove Corollary 17.12. 

31. Let (/„ ) be a sequence of measurable functions, all defined on some measurable 
set D. Show that the set C = [x € D : lim,,-^ f n (x) exists} is measurable. [Hint: 
C is the set where (f„(x)) is Cauchy.] 

32. Check that the conclusion of Theorem 17.8 holds (with the same proof) if 
“measurable” is everywhere interpreted as “Borel measurable” (and “measurable set” 
as “Borel set,” of course). Do the same for the four corollaries. What modifications, 
if any, are needed in Corollary 17.12? 

33. If / : (a, b) — ► R is differentiable, show that / ' is Borel measurable. If / is 
only differentiable a.e., show that / ' is still Lebesgue measurable. [Hint: Write / ' 
as the limit of a sequence of continuous functions.] 


We say that (/„) converges pointwise a.e. to / if f(x) = lim^oc f n (x) for almost 
every x in D, that is, if (/„) converges pointwise to / on D \ £, where m(E) = 0. Thus, 
Corollary 17.12 says that the collection of measurable functions is closed even under 
pointwise a.e. limits. 

Remarkably, pointwise a.e. convergence on a set of finite measure is actually equiva- 
lent to a slightly stronger form of convergence. 

Egorov’s Theorem. 17.13. Let ( f n ) be a sequence of measurable functions con - 
verging pointwise a.e. to a real-valued function f on a measurable set D of finite 
measure. Then , given e > 0, there is a measurable set E C D such that m(E) < e 
and such that (/„) converges uniformly to f on D\E. 

proof. We may obviously assume that f n -*f everywhere on D. Now, for each 
n and k , consider 


£(n, *)=U{*€D: \f m (x) - /(x)| > ^ J . 

If k is fixed, then the sets E(n, k ) clearly decrease as n increases; moreover, 
IX, £("■ *) = s * nce fn — * f everywhere on D. (Why?) 
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Since m(D) < oo, we have m(E(n, k)) — ► 0 as n -* oo. Consequently, we 
may choose a subsequence (n k ) for which m{E(n k , k )) < e/2 k . (How?) Now, if 
we set E = |J*Li £(rt*, then m(£) < e. What’s more, for x £ E, we have 
jc £ E(n k , A:) for any k and, in particular, |/ m (jt) — /(x)| < 1 /A: for all m > n*. 
Thus, /„ =4 / on D \ £. □ 

We say that (/„) converges almost uniformly to f on D if, for each s > 0, there is 
a measurable subset E of D, with m(E) < e, such that ( f n ) converges uniformly to / 
on D \ E. Now it is easy to see that almost uniform convergence implies convergence 
pointwise almost everywhere; thus, on a set of finite measure, Egorov’s theorem tells us 
that the two notions are equivalent. The requirement that / be real- valued (or, at worst, 
finite a.e.) cannot be dropped, nor can the requirement that m(D) < oo, in general. We 
will leave the proofs of these various claims to the exercises. 


EXERCISES 

34. Give an example showing that the requirement that / be finite, at least a.e., 
cannot be dropped from the statement of Egorov’s theorem. 

> 35. Give an example showing that the requirement that m(D) < oo cannot be 
dropped from Egorov’s theorem. 

> 36. If (/„) converges almost uniformly to /, prove that (/„) converges almost 
everywhere to /. [Hint: For each k, choose a set E k such that m(E k ) < l/k and 
f n =tf off E k . Then m (f|£, E k) = 0.] 

37. Clearly, if ( f n ) converges uniformly to / except, possibly, on a set of measure 
zero, then (/„) converges almost uniformly to /. On the other hand, give an example 
showing that almost uniform convergence does not imply uniform convergence except 
on a set of measure zero. 

38. Let (/„) be a sequence of measurable functions converging pointwise a.e. to a 

real-valued function / on a measurable set D of arbitrary measure. Show that there 
exist measurable sets E\ C E 2 C • • • C D such that ( f n ) converges uniformly to / 
on each E k and m ((|j£ti ) = 0. 


Approximation of Measurable Functions 

Our long-term goal is to improve on the result in Corollary 17.12 and to actually 
characterize measurable functions as the almost everywhere limits of certain “nice” 
functions. The first step in this process is extremely important. Watch closely. Better 
still, draw a few pictures! 

Basic Construction 17.14. If f : D [ 0, oo ] is a nonnegative measurable 
function , then we can find an increasing sequence of nonnegative simple functions 
((fn) with 0 < <p\ < q >2 < • • • < /, such that ((p n ) converges pointwise to f 
everywhere on D , and such that ((p n ) converges uniformly to f on any set where 
f is bounded . 
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proof. For each n = 1,2 define F n = [x e D : f(x ) > 2") and 

E nk = [x e D : k2~" < f(x) < (* + 1)2~" } for k= 0. 1. ...,2 2 " - 1. 

Since / is measurable, so are F„ and E nk . Now, for each n = 1,2 define a 

(measurable) simple function by 

2 il -l 

<p„ = 2 n X Fii 4- k2 ~ n X Eni - 

*=o 

Please note that <p n vanishes outside of D, that 0 <<p„<f, and that 0 < f -<p„ < 
2~ n on the set {/ < 2"}. Since D = LClil/ < 2"} U {/ = oo}, and since 
{/ < 2") c {/ < 2" +l } for any n, we get that <p n -* f pointwise on D (notice 
that <p„ = 2" on the set {/ = oo}). What’s more, it is obvious that <p„=tf on any 
set of the form {/ < M). (Why?) 

All that remains is to check that the <p„ increase. But 

E„. k = [2*/2" +1 < / < (2* + 2)/2 n+l } = E„ + ,.» U E„ +1 . a+l . 

On E n+ 1 , 2 * we have <p n = k/ 2" = 2k/2 n+i = <p n+x , while on E„ + i, 2 *+i we have 
(p„ = k/2 n < (2k + l)/2" +l = <p n +\. Finally, on the set 

E„ = {/ > 2") = {/ > 2 2n+l 2 Hn+1) }, 

it is clear that <p„ = 2" = 2 2n+ '/2 n+i < <p„+ 1 . Thus, <p n < <p n+ \ everywhere 
on D. □ 

Given a measurable function / : D -*■ [-oo, oo ], we apply the basic construction 
to each of f + and f~ to conclude: 

Corollary 17.15. If f : D -*■ [-oo. oo ] is measurable, then there exists a se- 
quence of simple functions (<p n ) such that 0 < |^i | < \<pi\ < ■■■ < \f\and<p„ -*■ f 
everywhere on D. Moreover, <p„=t f on any set where |/| is bounded. 

It is interesting to note that this construction works for any function / : D -*■ 
[ — oo, oo ], provided that we no longer require a simple function to be based on mea- 
surable sets. In other words, the measurability of / was only needed to ensure the 
measurability of the <p„. 

Corollary 17.16. Let f : D — ► [— 00 , 00 ], where D is measurable . Then, f is 
measurable if and only if f is the pointwise ( everywhere ) limit of a sequence of 
{ measurable ) simple functions. 


EXERCISES 

39. Modify the Basic Construction in the following way: For each n and k , choose 
a Borel subset of £■„.* of equal measure, call it A n and choose a Borel subset of 
F n of equal measure, and call it B n . Now define \// n = 2 n X Bm + k2 ~ n X Amk . 

Note that \J/ n is Borel measurable. Argue that converges pointwise to / on D 
except, possibly, on a set of measure zero. 
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40. If / is Lebesgue measurable, prove that there is a Borel measurable function g 
such that / = g except, possibly, on a Borel set of measure zero. [Hint: Every null 
set is contained in a Borel set of measure zero.] 


The point to Corollary 17.16 is that the collection of measurable functions is the 
closure of the (measurable) simple functions under pointwise limits. We could have 
easily taken this as our definition of measurability. 

If we consider measurable functions defined on an interval, it is possible to modify 
our construction to involve step functions, or even continuous functions, in place of 
simple functions (at the price of an extra “a.e.” here and there). This is the next item on 
the agenda. 

For the remainder of this section, then, we will suppose that we are given a measur- 
able, finite almost everywhere function / : [a, b] -> [— oo, oo] and an e > 0. 

Lemma 17.17. There is a finite constant K (depending on e) such that \ f\ < K 
except, possibly, on a set of measure less than e/2 . 

proof. The sets {|/| > n] decrease as n increases, each has finite measure, and 
atid/l > n] = {/ = ±oo} is a set of measure zero. Thus, m({|/| > n}) 0 

as n -» oo. In particular, m({|/| > «}) < e/2 for some n. □ 

The next step follows immediately from our Basic Construction. 

Lemma 17.18. There is a simple function (p, vanishing outside of [a, b], such 
that\(p\ < \f \, and such that\f—(p\ < e except, possibly, on the set where \f\ > K 
(a set of measure less than e/2). 

At this point, / has been well approximated by a simple function (p based on mea- 
surable sets. We next replace each of these underlying measurable sets by “nice” sets, 
and so build a new approximation for /. As with the Basic Construction itself, you may 
find it helpful to sketch a few pictures to go along with the refinements presented below. 

Lemma 17.19. There is a continuous function g on R, vanishing outside of 
[ a, b ], such that g = cp except, possibly, on a set of measure less than e/2. 

proof. Write <p = a i^A n where each a t e R, and where A \, . . . , A n are 
pairwise disjoint measurable subsets of [ a, b ] with (JjLi = [<*,&]. For each i , 
choose a closed set F, c A,* O (a, b) such that m(A/ \ F f ) < e/(2 n), and consider 
the function f ctiX Fr We clearly have f = (p on the set F = (J" =1 F}, 

where [a, b] \ F = (J^L^A,- \ Fi) IS a set °f measure less than e/2. 

To finish the proof, then, it suffices to show that the function g defined by 
g = ai on the set F if for i — 1, . . . , n, that is, g = \{s\ F , can be extended to a 
continuous function on R that vanishes outside [ a, b ]. The fact that F U {a, b) is 
closed makes this easy: Since the open set G = R \ (F U [a, b)) can be written as 
the countable union of pairwise disjoint intervals (with endpoints in F U [a, b}) 9 
we may extend g linearly on each of the constituent intervals in G, taking g = 0 
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on (-oo, a ] and ( b, oo). (How?) It is easy to see that this defines g as a continuous 
function on R (see Exercise 41). □ 

Combining these results gives us Borel’s theorem (see also Exercise 43). 

Theorem 17.20. Let f :[a,b] -*■ [— oo, oo ] be measurable and finite a.e. Then, 
for each e > 0, there is a continuous function g on [a, b] such that \f — g| < e 
except, possibly, on a set of measure less thane. If k < f < K, for some constants 
k and K, then we can arrange for k < g < K, too. 

proof. The first assertion follows easily from the previous three lemmas. To 
prove the second assertion, note that if k < f < K, then the function 

g = K A (k v g) = min {tf, max{fc, g}} 

is continuous, satisfies k < g < K, and, in addition, has \f - g\ < \f - g\. 
(Why?) □ 

It is convenient to use the shorthand m{\f — g\ > £} < e in place of the more 
cumbersome phrase “\f - g| < e except, possibly, on a set of measure less than e." 
Similar abbreviations could be used to shorten other statements; for example, m{g / 
<p) < e is an obvious replacement for “g = <p except, possibly, on a set of measure less 
than e." 


EXERCISES 

> 41. Let E be a closed subset of R, and let / : E -*■ R be continuous. Prove that / 
extends to a continuous function on all of R. That is, prove that there is a continuous 
function g : R R such that g{x) = / (x) for x e E. Moreover, g can be chosen 
to satisfy sup x€R |g(x)| < sup xe£ |/(x)|. 

42. 

(a) Given a compact set K and a bounded open set U D K, show that there is a 
continuous function / : R -*■ R such that / = 1 on AT, / = 0 on U c , and 
0 < / < 1 everywhere. 

(b) Given a measurable set E with m(E) < oo, and e > 0, show that there is a 

continuous function / : R -*■ R, vanishing outside some compact set, such that 
0 < / < 1 everywhere, andm{/ # < e. 

43. Let / : [ a, b ] — ► [— oo, oo ] be measurable and finite a.e., and let e > 0. 
Modify the proof of Borel’s theorem to show that there is a polynomial p such that 
m{\f ~ P\>£) <s. 

44. Let / : [ a, b ] — ► [— oo, oo ] be measurable and finite a.e. Prove that there is 
a sequence of continuous functions (g„) on [ a, b ] such that g„ -*• / a.e. on [ a, b ]. 
In fact, the g„ can be taken to be polynomials. [Hint: For each n, choose g„ so that 
E n = {1/ — gnl > 2~" } has m(E„) < 2~" . Now argue that g„ — ► f off the set 
E - lim sup^oo £„.] 



310 


Measurable Functions 


45 . Let / : [a, b] — ► R be measurable and finite a.e., and let e > 0. Show that 
there is a continuous function g on [a, b] with m[f # g} < s. [Hint: Combine 
Exercises 41 and 44 and Egorov's theorem to find continuous functions ( g n ) and a 
closed set F with m([ a y b ] \ F) < e and g n =X / on F. Now argue that f\ F extends 
to a continuous function g.] 

46 . (Luzin’s Theorem) Show that / : R R is measurable if and only if, for 
each £ > 0, there is a measurable set E with m(E) < e such that the restriction of 
/ to R \ E is continuous (relative to R \ £). 

47 . Show that / : R — ► R is measurable if and only if, for each e > 0, there is a 
continuous function g : R R such that m{f ^ g} < e. 

48 . Luzin’s theorem does not say that a measurable function is continuous on the 
complement of a null set. Indeed, show that there is a measurable set K C [ 0, 1 ] 
such that X K is everywhere discontinuous in [ 0, 1 ] \ N for any null set N. 

49 . 

(a) Given a simple function <p : [ a, b ] — > R and e > 0, show that there is a step 
function g on [a y b ] such that m{g ^ (p) < e. [Hint: Write (p = aiX Ar 
For each /, choose a finite union of intervals B, with m(A, AB, ,) < efn. Now 
let * = £;!, a, X B ,.] 

(b) Let / : [ a, b ) — ► [— 00 , 00 ] be measurable and finite a.e., and let e > 0. Show 
that there is a step function g on [ a, b ] such that m[\f — #| > e} < e. If, in 
addition, k < f < K. show that g can be chosen to satisfy k < g < K, too. 

50. Let (/„) be a sequence of real- valued measurable functions on [ 0, 1 ]. Show 

that there exists a sequence of positive real numbers ( a n ) such that a.e. 


The various approximation results in this section, along with certain of the exercises, 
allow us to summarize our findings: 

/ is measurable and finite a.e. 

<=> / is the limit of a sequence of (measurable) simple functions; 

<==> / is the a.e. limit of a sequence of step functions; 

<=> / is the a.e. limit of a sequence of continuous functions; 

«=> given e > 0, there is a continuous function g such that m[f ± g\ < e. 


Notes and Remarks 

Lebesgue’s approach to integration is intimately tied to the notion of measurable func- 
tions. Indeed, according to Hawkins [1970], “it was the properties of measurable func- 
tions and the structure of the sets [ {* : a < f{x) < b) ] that guided Lebesgue’s 
reasoning and led to his major results.” However, it is also fair to say that Lebesgue had 
little interest in the formalities of measure and of measurable functions; his primary 
interest was integration. The formal discussions of measurable sets and measurable 
functions occupy but a few pages in the Leqons (Lebesgue [1928]). 
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Exercise 11 is based on the discussion in Wilansky [1953a]. Exercise 18 can be 
traced to Hille and Tamarkin [1929]. 

Theorem 1 7. 1 3 is due to D. F. Egorov [1911]. The clever proof presented here is due 
to F. Riesz [ 1 928b] . Necessary and sufficient conditions for almost uniform convergence 
are given in R. G. Bartle [1980a]. Other variations, generalizations, and examples can 
be found in Luther [1967], Rozycki [1965], Suckau [1935], and Weston [1959, I960]. 

Much of the last section is adapted from, or at least influenced by, Sierpinski [1922] 
(and its references). Herein Sierpinski proves the theorems of Borel (Theorem 17.20; 
see Exercise 43 for a result that is closer in spirit to Borel’s original theorem), Frechet 
(Exercise 44), and Luzin (Exercises 46 and 47). 

N. N. Luzin (sometimes spelled “Lusin”) was a student of D. F. Egorov; not sur- 
prisingly, Luzin’s proof of his result is based on Egorov’s theorem. For an elementary 
proof of Luzin’s theorem, independent of Egorov’s theorem, see Oxtoby [1971]. For 
more on this student-adviser pair, see Allen Shields [1987b]. Shields’s article is highly 
recommended to any student with an adviser, and, likewise, to any adviser with a stu- 
dent: See Egorov’s letter to Luzin, quoted on p. 24 of the article, for a taste of a time 
gone by. 

Exercise 41 is a simple version of Tietze’s extension theorem, whereas Exercise 42 
(a) is an easy version of Urysohn’s lemma. See, for example, Folland [1984] for more 
general versions of these two theorems. 
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The Lebesgue Integral 


We’ve set the stage for the Lebesgue integral in the previous two chapters; now it’s time 
for the star to make her entrance. By way of a reminder, recall that we want our new 
integral to satisfy at least the following few, loosely stated properties: 

• f xe — m(E), whenever E is measurable. 

• The integral should be linear: f(af + fig) = a f f + ft f g. 

• The integral should be positive (or monotone): / > 0 => / / > 0 (or / > g => 
/ f > f g). In the presence of linearity, these are the same. 

• The integral should be defined for a large class of functions, including at least the 
bounded Riemann integrable functions, and it should coincide with the Riemann 
integral whenever appropriate. 

The first two properties tell us how to define the integral for simple functions. Once we 
know how to integrate simple functions, the third property suggests how to define the 
integral for nonnegative measurable functions: If / > 0 is measurable, then we can find 
a sequence (<p„) of simple functions that increase to /. Now set / / = lim„_oo/ W 
Finally, linearity supplies the appropriate definition for the general case: If / is mea- 
surable, then / + and f~ are nonnegative, measurable, and / = f + — f~- So, set 
f f = f f + - f f~, provided that this expression makes sense (we wouldn’t want 
oo - oo, for example). 

These few steps outline our plan of attack. If all goes well, we’ll find that the 
new integral is defined (and finite) for any bounded measurable function defined on a 
bounded interval - more than enough functions to recover the Riemann integral. 

Meanwhile, we will take some care to distinguish between this new integral and 
the Riemann integral; in particular, the abbreviated notation f f in place of f(x)dx 
is not simply an example of laziness, but rather is intended to further highlight this 
distinction. 

There are, of course, a few details to check along the way. We begin with the 
“obvious” case of defining the Lebesgue integral for simple functions. 


Simple Functions 

We say that a simple function <p is (Lebesgue) integrable if the set {<p ± 0) has finite 
measure (in short, if <p has finite support). In this case, we may write the standard 
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representation for <p as (p = a iXA n where ao = 0 , ai, are distinct real 

numbers, where Ao = {<p = 0}, A \, . . . , A n are pairwise disjoint and measurable, and 
where only Ao has infinite measure. Once (p is so written, there is an obvious definition 
for f <p y namely. 


/ c r°° n 

<P= I <P= <p(x)dx = y'a i m(A i ). 
JR J — oo j__ j 


In other words, by adopting the convention that 0 • oo = 0, we define the Lebesgue 
integral of <p by 


J \/= o / i=0 


Please note that a t m(A() is a product of real numbers for i ^ 0, and it is 0 • oo = 0 for 
i = 0; that is, f cp is a finite real number. 

In brief, if <p is an integrable simple function, then 



am[(p = a }, 


where the sum on the right actually involves only finitely many nonzero terms, each of 
which is finite, provided that we take 0 • oo = 0. 

By way of an easy example, note that xq is Lebesgue integrable and that / xq = 0. 
Our first chore is to check that the definition of f (p does not actually depend on any 
particular representation of (p. This requires a couple of easy calculations. 


Lemma 18.1. Let (p be an integrable simple function , and let <p = j 
be any representation with disjoint and measurable. Then , J (p = 

5Xi bi m(Ei). 


proof. First note that for any a e Ewe have [<p = a] = E iy where the 
union is over the set {/ \ b % — a for some 1 < i < n}. In particular, notice that 
a m[(p = a) = h m (Ei)y and that this is good even for a = 0. Consequently, 

/ n ^ 

<p = ]Tam{«p = a} = EE bi m(Ej) = ^2 b ' a 

aeR oeR bj—a i=l 

Using Lemma 18.1, we can easily check that the integral is both linear and positive 
on integrable simple functions. 


Proposition 18.2. If(p and j/ are integrable simple functions , then for a, ft eR 
we have f (a<p -f fity) = a f <p + P f If <p >\lr a.e. r then f (p > f \fr. 

proof. The heart of the matter here is to find representations for cp and jr based 
on a common partition of R so that we can readily combine and compare integrals, 
and this is easy. 

Write (p = £"=o a <X 4 , and x/r = £* =0 b, Xfl, , where a 0 = 0, a u . . . , a n are 
distinct, bo = 0, b x , . . . , b k are distinct, A 0y ... y A n are disjoint and measurable, 
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and Bo fi* are disjoint and measurable. Then (J"=o = R = U;=o Bj> l 501 * 1 

being disjoint unions, and all but /to and Bo have finite measure. Now we can 
write R = (J" =0 U^M,- H By). This is again a disjoint union, and all but A 0 D B 0 
have finite measure. 

Using this new partition of R we may write 


n k n k 

XA,nB, • rfr = YU2 b J XA ' nB r 

/= 0 7=0 / = 0 j — 0 


and so 


n k 


a<p + f)\l/ = ^2 X (ora ' + P b j)XA,nB r 

i =0 j = 0 

The linearity of the integral is now an immediate consequence of Lemma 18. 1 : 

n k 


/ n k 

(cap + fir//) = X] X<“ a - + pbj)m{Ai n Bj) 

; _n ; —rt 


i =0 ; =0 

w A: n £ 

a, m(i4j n Bj) + P X X bj n Bj) 

/ = 0 j— 0 i=0 7=0 


= a J <p + P J rp. 


Finally, if <p - \f/ > 0 a.e., then f <p- f = f(<p-ip)> 0, since any negative 
values of ip - ^ occur only on null sets. □ 


Corollary 18.3. Given a \ , . . . , a n eR and measurable sets E\ . . . , E n , each with 
finite measure , we have 


J ^X a, * £ ' j = X fl < m ( £ ')- 


If ip is an integrable simple function, and if £ is a measurable set, we also define 



<P ■ Xe- 


This makes sense since <p • xe is again an integrable simple function. When £ = [ a, b ], 
though, we usually just write fl’ <p. 


Nonnegative Functions 

We next define the integral for nonnegative measurable functions. There is a bit of 
“upper and lower integral” going on here (which we will pursue later) but, in essence, 
the definition is based only on the monotonicity of the integral and what we already 
know about simple functions. 
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If / : R — ► [ 0, oo ] is measurable, we define the Lebesgue integral of / over R by 

/ / = » up (/ <P '• 0 < <p < f, (p simple and integrablej . 

We are not excluding the possibility that f f = oo here. If / / < oo, then we will say 
that / is (Lebesgue) integrable on R. Please note that in any case we obviously have 
//> 0 . 

This definition is consistent with our first one. That is, if ^ is a nonnegative, inte- 
grable, simple function, then 

j y\f = sup | J (p : 0 < <p < x//, (p simple and integrablej . 

(Why?) But the new definition says more: It defines / ^ for any nonnegative simple 
function. In particular, if E is any measurable set, then f xe = m(E). This is clear if 
m(E) < oo, and when m(E) = oo, we have 


/* £ > S U P / 


Xe m-n.n 1 = supm(E n [-«, n ]) = m(E ) = oo. 


It is easy to see that if / and g are nonnegative measurable functions with f < g, 
then f f < f g. And it is virtually effortless to check that / (c/) = c f f for c > 0. 
Additivity is harder to check; in fact, we will stall the proof until we have gathered 
more equipment for the task. 

If E is a measurable set, and if / is nonnegative and measurable, we define 


w 


/ • Xe- 


When / is defined only on £, we simply take / = 0 outside of E. From our earlier 
remarks, this, too, is consistent with the case for simple functions. Again, if E = [ a, b ], 
we will stick to the familiar notation /* /. 

In our search for new machinery, an extremely important observation is that the 
expression f E f is a well-behaved function of the set E. For example, notice that if 
m(E) = 0, then f E f = 0. Indeed, if <p is an integrable simple function with 0 < ip < 
f xe< then we must have <p = 0 a.e., and hence / <p = 0. (Why?) Also note that if / > 0 
and if E c F are measurable, then f E f < f F /, since / Xe £ / Xe- 

Along similar lines, if / is bounded above on E, say 0 < / < K on £, then 
f E f< Km(E), since / xe < K Xe (see Figure 18.1). A somewhat more interesting 
observation is that / > a X(/>«i for any a > 0, and hence f f > a m[f > or} (see Fig- 
ure 1 8.2). This timid little inequality ranks right up there with the triangle inequality 
for utility per pound. It certainly merits stating again. 


Chebyshev’s Inequality 18.4. If f is nonnegative and measurable, then f f > 
a m[f > a) for any a > 0. 


Here is an immediate application; 

Corollary 18.5. If f is nonnegative and integrable, then f is finite a.e. 
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Figure 

18.1 E 



1 I 

S5 {/*«} 

proof. Recall that {/ = oo} = P|~,{/ > n }. The sets {/ > n} decrease as 
n increases and, from Chebyshev’s inequality, m{f > n) < (1/n) / / -► 0, as 
n -*■ oo, since / is integrable. Thus, m[f = oo} = lim„_. 00 m(/ > n] = 0. □ 


EXERCISES 

> 1. If yj/ is a nonnegative simple function, check that 


J = sup | J (p : 0 < <p < ifr, <P simple and integrable 


2. Let / : R —*■ [ 0, oo ] be integrable and define F : [ 0, oo) — ► [ 0, oo ] by F(a) = 
m{f > a). Show that F is decreasing and right-continuous, and that F(a) — ► 0 as 
a —*■ oo. [Hint: / is finite a.e.] 

> 3. Prove that /“( 1 /* ) dx = oo (as a Lebesgue integral). 


We next roll up our sleeves and tackle the question of additivity of the integral. As 
was suggested earlier, we will consider f E f as a function of the set E. What we will 
find is that the function fi(E) = f E /, E e M, is a measure on M. This means that n is 
nonnegative, monotone, n(0) = 0, and, most importantly, that n is countably additive. 
We have already checked a few of these properties; the hard work comes in establishing 
countable additivity. We begin with a special case: 
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Lemma 18.6. Let <p be an integrable simple function. If E\ c £2 C ■■•is 
an increasing sequence of measurable sets, and if E = (J“ , E„, then f E <p = 
lim n _ 00 <p n . 

proof. Write <p = £* =) a, xa, . where each a, 96 0 and where the A, are pairwise 
disjoint measurable sets, each having finite measure. Now, let (£„ ) be an increasing 
sequence of measurable sets, and let E = (J^Li £«• Then, f E <p = f <p ■ xe = 
5Z*=i a i m ( A i n £)• And now we appeal to the fact that Lebesgue measure is 
countably additive, ik la Lemma 16.23 (i), to write 



We used the fact that Lebesgue measure is countably additive to establish the “con- 
tinuity” results of Lemma 16.23. It is not hard to see, though, that the conclusion of 
Lemma 16.23 (i) is actually equivalent to the countable additivity of m. In the same 
way. Lemma 18.6 actually shows that the map p(E) = f E <p is a measure on M. See 
Exercise 8 for more details. 

We will use Lemma 18.6 to prove a result of fundamental importance: 

Monotone Convergence Theorem 18.7. If 0 < f\ < f 2 < ■■■ is an increasing 
sequence of nonnegative measurable functions, then 

/ ( lim /„) = lim / /„. 

J \n-+oo / n->oo J 


proof. Since the /„ increase, note that / = lim^oc f„ = sup„ /„ exists and is 
also nonnegative and measurable. And since we also have f f„ < f f „+ 1 < If 
for all n, we have that lim„_ 00 / f„ exists and satisfies lim n _,oo //„<//. 

We need to show that lim n _ 00 //«>//. Of course, given e > 0, it would 
be enough to show that lim,,...,*, / f„ > (1 - e)f f. To do this, it is enough to 
show that lim n _ 0o f f„ > (1 — e)f<p for any integrable simple function ip with 
0<<p < f. (Why?) 

Let <p be an integrable simple function with 0 < <p < f, and consider the sets 
E„ = {/„ > (1 - e)(p}. Note that £„ is measurable and that, since f„ < f„+ 1, we 
have E„ c £„+ 1. Also, since f„ f > (1 - e)<p, we have that |J“ , £„ = R. 
(Why?) 

Now we apply Lemma 1 8.6. Since 

f fn > f fn> f (1 -e)<p = (1 -e) f <p 
J J Eh J E n * Ef 1 

for all n, we have 

lim / f„ > (1 - s) lim / <p = (l — e) f <p. □ 
n ~*°° J n-*cc J E * J 
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The fact that the integral commutes with increasing limits allows us to put an inter- 
esting twist on our Basic Construction. 

Corollary 18.8. If f is a nonnegative measurable function, then there is an in- 
creasing sequence of integrable simple functions 0 < <(>\ < q>i < • • • < / such 
that f = limn—oo (p„ and / / = lim„ 

-►00 f<Pn- 

proof. Let (ir n ) be any sequence of nonnegative simple functions that increase 
pointwise to /. For example, take (f/ n ) to be the sequence of simple functions 
given by the Basic Construction. We need to show that the \f/„ can be replaced 
by a sequence of integrable simple functions. But this is easy: Just take <p„ = 

\ lr n ■ Xi-n.n i- Each <p n is now supported on a set of finite measure, and hence is 
integrable, and (<p„) increases pointwise to / since xi-n. n ) increases pointwise to 
Xr, the constant 1 function. It follows from the Monotone Convergence Theorem 
that / <p n increases to / /. □ 

The point to Corollary 18.8 is that both / and / /are completely determined by 
the sequence ( <p„ ). We might have even used this fact to define / /. In any case, the 
additivity of the integral is now a piece of cake: We already know that the integral is 
additive over simple functions, and we know that limits are additive. The rest is easy. 

Corollary 18.9. If f and g are nonnegative measurable functions, then f(f + 
g) = / / + f g. In particular, f El)F f = f E f + f F f for any disjoint measurable 
sets E, F. 


proof. Choose two sequences of nonnegative, integrable simple functions: ( <p n ) 
increasing to / and (/•„) increasing to g. Then, ( <p„ + \J/ n ) increases to / + g and 
so, by applying the Monotone Convergence Theorem (no less than three times!), 
we have 



+ «) = lim 

rt— ► 00 

= lim 

n— ►oc 


J ( (Pn + 'I'n ) 

J <Pn+ j »n J \l/ n = J f + J 


g- 


□ 


EXERCISES 

> 4. Find a sequence (/„) of nonnegative measurable functions such that lim,,—^ 
f n = 0, but lim^oo f f n = 1. In fact, show that (/„) can be chosen to converge 
uniformly to 0. 

5. Suppose that / and g are measurable with 0 < / < g. If g is integrable, show 
that / and g - f are integrable and that f(g - f) = f g - f /.In fact, the formula 
is still true even if we assume only that / is integrable. 

6. Suppose that / and (/„) are nonnegative measurable functions, that (/„) de- 
creases pointwise to /, and that f /* < oo for some k. Prove that / / = 
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lim^oo/ /„. [Hint: Consider (/* — /„) for n > k .] Give an example showing 
that this fails without the assumption that f f k < oo for some k . 


We are halfway home: The set function p(E) — f E f is nonnegative, monotone, 
and finitely additive. We next consider the null sets for p. Here, finally, we will see 
a connection with the underlying function /. In brief, our next result tells us that the 
integral ignores the letters “a.e.” 

Lemma 18.10. Let f be nonnegative and measurable. Then , / / = 0 if and only 

if f = 0 a.e. 

proof. First suppose that / = 0 a.e. Then, m[f > 0} = 0 and, hence, 

ff=[ f+f f = 0 + 0. (Why?) 

J •'(/= 0} J{/> 0} 

Next suppose that / / = 0. To compute m{f > 0}, we first use Chebyshev’s 

inequality to note that 

m f =° 

for all n. Since {/ > 0) = {/ > (1/n)}, we get m{f > 0} = 0. □ 

Our two applications of Chebyshev’s inequality provide some insight into how in- 
tegrable functions are “built.” If / is nonnegative and integrable, then m{f = oo) = 0 
since m{f > n) < (1/n) f f -*■ 0. What’s more, the support of /, that is, the 
set (/ ^ 0), can be written as an increasing union of sets of finite measure: 
{/ > 0) = Ur=i (/ 5; (l/«)} and m{f > (1/n)} < n // < oo. (This still allows 
m {f > 0} = oo, of course.) Once we bring the Monotone Convergence Theorem into 
the picture, we can say even more. Consider the following string of equations: 

/ oo pn n /* 

/ = lim / / = lim / / = lim / /. 

.oo "-+<*> J-n n -*°°J{f<n} 

The first two limits are good for any nonnegative measurable function /. In order that 
the third limit equal f /, it is necessary that / be finite a.e. (Why?) 

The Monotone Convergence Theorem easily allows us to consider series of non- 
negative functions. The following corollary is actually equivalent to the Monotone 
Convergence Theorem, but it’s well worth the effort of a separate statement. In this 
form it’s often called the Beppo Levi theorem , after its creator. 

Corollary 18.11. If (f n ) is a sequence of nonnegative measurable functions, then 



proof. Note that since the f n are nonnegative, both infinite sums exist: The partial 
sums J2n = 1 fn increase to /„, while from monotonicity and additivity of the 
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integral we have that / (Eli /■) = Eli f f» increases to Eli //«• The 
Monotone Convergence Theorem finishes the job: 

f(P) m fp°>P) = ^f(P) 

N * oo p 

= Jl-E//- = E//- □ 

n— 1 ^ n=l * 

Here, finally, is the result we were looking for: 

Corollary 18.12. If f is nonnegative and measurable, then the map E t-+ f E f 
is a measure on M. In particular, if (£„) is a sequence of pairwise disjoint 
measurable sets, then 



Again, the upshot of this observation is that the map E >-* f E f has certain “conti- 
nuity” properties. See Exercise 17 for a particularly striking result along these lines. 


EXERCISES 

7. Let p : A -*• [ 0, oo ] be a nonnegative, finitely additive, set function defined 
on a o -algebra A. Prove that: 

(i) p(E) < p(F) whenever E, F 6 A satisfy E c F. 

(ii) if p(0) ± 0, then p(E) = oo for all E e A. 

8. Let p : A — »■ [ 0, oo ] be a nonnegative, finitely additive, set function defined 
on a a -algebra A. Prove that the following are equivalent: 

(0 p (Uli £«) = Eli M(£/i) for every sequence of pairwise disjoint sets (E„) 
in A. 

(ii) P (IX, £") = hm„_ 00 p(E„ ) for every increasing sequence of sets (£„) in A. 

> 9. Let / be measurable with / > 0 a.e. If f E f = 0 for some measurable set £, 
show that m(£) — 0. 

> 10. If / is nonnegative and measurable, show that /l / = lim„_ 00 /" / = 

hmn-.oo /(y> ( i^ rt j) /• 

>11. If / is nonnegative and integrable, show that /l / = Urn,,-.,*, / ( ^ <n( /• 

> 12. True or False? If / is nonnegative and integrable, then lim x _, ±oo f(x) = 0. 
Explain. 

13. Let / : [0, 1 ] — ► [0, 1 ] be the Cantor function. Show that f* f = 1/2. [Hint: 
/ is constant on each interval in the complement of A.] 

14. Define / : [0, 1 ] — ► [0, oo) by f(x) = 0 if x is rational and f(x) = 2” if 
x is irrational with exactly n = 0, 1, 2 , . . . leading zeros in its decimal expansion. 
Show that / is measurable, and find f Q l /. 
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15. Let / be nonnegative and measurable. Prove that f f < oo if and only if 
E*l-oo 2*m{/ > 2*} < 00. 

16. Let / be nonnegative and integrable. Given e > 0, show that there is a measur- 
able set E with m(E) < oo such that f E f > f f — e. Moreover, show that E can 
be chosen so that / is bounded (above) on E. 

17. If / is nonnegative and integrable, prove that the function F(x) = f is 
continuous. In fact, even more is true: Given e > 0, show that there is a 8 > 0 
such that f E f<e whenever m(E) < 8. [Hint: This is easy if / is bounded; see 
Exercise 16.] 


By now you’ve noticed how effortlessly we’ve been able to exchange limits and 
integrals, at least in certain cases. If you’ll take it on faith, temporarily, that the Lebesgue 
integral includes the Riemann integral as a special case, then you’ll certainly agree that 
we’ve improved on our old integral. Of course, as the exercises point out, even the 
Lebesgue integral won’t commute with all limits. Nevertheless, we can always at least 
compare f lim n _oo /„ and lim„_oo f fn • Our next result tells us how; it’s a useful little 
gem! 

Fatou’s Lemma 18.13. If (/„) is a sequence of nonnegative measurable func- 
tions, then 


f (liminf/„) 
J \ n-> oo / 


< lim inf 

n —*■ oc 



proof. Let g„ = inf{/„, /„ + i, . . .}. Then g„ is nonnegative, measurable, and 
( g „ ) increases to liminf*_»oo /*. From the Monotone Convergence Theorem, 
/ (liminf„_oo//i) = lim n _oo/ g„. It remains only to estimate lim n _oo/ g„. 
But, 


Thus, 



for k > n 


lim 


g„ < lim inf / /* = lim inf / /„ 

J n-*oo k>n J n-+ oo J 


□ 


Just for good measure, here’s the proof of Fatou’s Lemma in one line: 

/ lim inf f„ = lim / inf /* < lim inf / /* = lim inf / /„. 

J n -* oo n-+oo J k>n n-*ook>n J n-*oc J 

Of course, should both lim^oo fn and lim„_oo / fn exist, then Fatou’s Lemma assures 
us that f iim n _oo f n < lim n _.oo f fn • 
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EXERCISES 

18. Show that strict inequality is possible in Fatou’s Lemma. [Hint: Consider /„ = 

X( W./I+ 1 )*1 

19. If (/„) is a sequence of nonnegative measurable functions, is it true that 
limsup„_ oc //„</( lim sup^^ /„)? What if (/„) is uniformly bounded? 

20. If / and (/„) are nonnegative measurable functions, and if /„ — ► / a.e., prove 
that / / < lim inf,,.^ / /„. 

21. Suppose that / and (/„) are nonnegative measurable functions, that / = 
lim^oo /„, and that /„ < / for all n. Show that f f = lim,,.*^ f /„. 

> 22. Suppose that / and (/„) are nonnegative measurable functions, that / = 
lim,,^ /„, and that / / = lim^^ f f„ < oo. Prove that f E f = lim,,-.^ f E f n 
for any measurable set E. [Hint: Consider both f E f and f E , /.] Give an example 
showing that this need not be true if f f = lim,,-.,*, / f„ — oo. 


The General Case 

We are now ready to define the Lebesgue integral for the general measurable function 
/ : R ->• [-oo. oo ). As you will recall, if / is measurable, then so are the positive and 
negative parts of / : 

/ + = /v 0 and /“ = -/ a0. 

Recall, too, that f + and /” satisfy 

f = f + -f- and |/| = f + + f~ 

and also (/ + )(/ - ) = 0 = f + a f~ (that is, / + and /" are disjointly supported). 

We now define the Lebesgue integral of / in the only way we can! If at least one of 
f + or f~ is integrable, we define 

otherwise, / / is not defined (after all, we cannot allow oo — oo). If both / + and 
f~ are integrable, then we say that / is (Lebesgue) integrable. This is precisely the 
condition that is needed to force / / to be a real number. But please note that this differs 
substantially from Riemann i ntegrabi 1 ity ; in fact, it is worth repeating: 

/ is Lebesgue integrable 

<=> both / + and f~ are integrable 

<=> / / + < oo and f f~ < oo 

«=> f f + + f f~ < oo (since each is > 0) 

<=► /I/I < oo 

|/| is Lebesgue integrable. 
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By way of a quick example, note that / = 2 xqh[o,i ] - 1 is not Riemann integrable on 
[0, 1 ] while |/| = 1 is. 

If E is a measurable set, then, as before, we define 

//=/ 

provided that this makes sense, of course. As usual, if / is defined only on E , we simply 
extend / to all of E by setting / = 0 outside E. In this case, f E f = / /. In either case, 
f E f depends only on the restriction of / to E. If f E \ f\ < oo, then we will say that / 
is integrable on E. 

High on our list of projects is to show that the collection of integrable functions is a 
vector space and that the integral is a linear real- valued function on this space. Before 
we attack these issues, though, let’s make a few simple observations. 

Observations 18.14 

(a) One more time: / is integrable if and only if | / 1 is integrable and, in either case, 
|//| </l/l-(Why?) 

(b) If / is integrable, then / is finite a.e.; that is, m{\f\ — oo} = 0. 

(c) If / is integrable and m(E) = 0, then f E f = 0. Together with our second 
observation, this says that we might as well consider an integrable function 
to be real-valued. Notice, too, that our new definition is in accord with our 
previous definition; in fact, if / > 0 a.e., then / = /+ a.e., and so / / = f 
/+ > 0. 

(d) If / and g are measurable with |/| < |g| a.e., and if g is integrable, then / is 
integrable, too, and / |/| < / |g|. In particular, if we also have / = g a.e., then 
/ is integrable and f f — / g. 

(e) If / : [a, b] E is bounded and measurable, then / is integrable on [a, b]. 
In particular, if / is a bounded Riemann integrable function on [a, b ], then / 
is also Lebesgue integrable on [a, b ]. What’s more, as we will see shortly, the 
two integrals agree; that is, ( R ) /* f(x ) dx = (L) /J 7 /. For the time being, or at 
least until we prove that the Lebesgue integral subsumes the Riemann integral, 
we will distinguish between the two integrals, when necessary, by using the 
prefixes ( R ) and (L). 

(f ) Given a Lebesgue measurable function /, recall that there is a Borel measurable 
function g with / = g a.e. If we were only interested in computing integrals, 
this means that we would only need Borel measurable functions. 

We denote the collection of Lebesgue integrable functions / defined on E by Li(E). 
Given a measurable set E , we will also consider the collection of functions integrable 
on E, which we denote by L[(E). More precisely, L\(E) is the collection of measurable 
functions / defined on E for which f E \f\ < oo. Equivalently, L\{E) consists of all 
functions of the form /xe, where / : E -> [— oo, oo] is measurable and fxE is 
integrable. The point here is that when considering the collection L\ (£), we do not care 
much about what goes on outside the set E\ the elements of L\(E) need not be defined 
outside of E. 
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As you might imagine, we are most interested in the case where £ is an interval; 
that is, we are interested in the spaces L\[a, b ], L|[0, oo), L|(R), and so on. But, as 
it happens, the vector and metric space properties that we are concerned with will not 
actually depend on the underlying set E. For this reason, we may occasionally just 
write L\ to denote a typical space L\{E). In fact, there is no real harm in thinking of 
L\ — £|(R) as the typical case (and this is precisely what we will do). 

Before we compare our new integral to the Riemann integral, let’s at least establish 
a few of the familiar properties of the integral in the case of the Lebesgue integral. 
As you have no doubt grown accustomed to by now, we will interpret the elementary 
properties of the integral in terms of the vector space and lattice structure of the entire 
collection of integrable functions. 

Proposition 18.15 L\ is a vector space and a lattice, under the usual pointwise 
operations on functions, and the Lebesgue integral is a positive, linear, real-valued 
function on the space. 

proof. Given f,geL\ and a,b e R, we have \af + bg\ < |a| |/| 4- |i>| |g| a.e. 

(at least where / and g are real-valued). Thus, af + bg € L\. 

That L\ is a lattice now follows from the fact that it is a linear space contain- 
ing the absolute values of its elements. Specifically, if / and g are integrable, 
then 


f v g = ±(\f - g\ + f + g) a.e. 

and so \f v g| < |/| + |g| a.e. Thus, \f v #| 6 L\. Similarly for f a g. 

Now, to show that integration is linear and positive is easy. First notice that 
(a/)* = a/* for a > 0, and that (a/)* = -af* for a < 0. From this it is easy 
to check that f (af) = a f f for any / € L\ and any a € R. Next, if f,geL\, 
then 


(f + g) + -(/ + «) = f + g = f + ~f +g + ~g a.e., 

or at least where /* and g ± are all real-valued. Thus, 

(f + g) + + f~ +g-=(f + g)~ + f + +g + a.e., 

and both sides represent nonnegative measurable functions. Since the integral is 
additive for nonnegative functions (and since it ignores that “a.e.”), 

j(f+g) + + f r+fg~ = f(f+gr+ J f + + f g + - 

Now, since each of these integrals is finite, we can rearrange them to read 

f(f + g) = f(f + g)+~ /(/ + *>" 

= / r -/ r+ /**-/ g " = / /+ / s 
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Finally, as we have already observed, if f e L\ and if / > 0 a.e., then / / > 0. 
Combining this with the linearity of the integral, we have / / > / g whenever /, 
gel i satisfy f > g a.e. □ 

Please take note of the fact that in each of the various calculations in Proposition 
18.15 we were able to draw conclusions based only on the almost everywhere validity 
of equations and inequalities. We will have more to say about this fact later. 

For now, let’s recover all of elementary calculus in one stroke: 


Theorem 18.16 If f : [a,b] -> R is a bounded Riemann integrable function, 
then f is also Lebesgue integrable on [a, b] and the two integrals agree: 


(R) f f(x)dx 



proof. From Theorem 17.4 and Observation 18.14 (e), all we need to establish 
here is the equality of the integrals. What makes this possible is the fact that the 
two integrals clearly coincide for step functions. (Why?) 

Since / is Riemann integrable, we can find two sequences of step functions 
(£„) and ( u n ) with £ n < / <u n such that 

pb pb pb 

sup t„ = (/?) / f(x)dx = inf / u„. 
n Ja Ja n Ja 

(Notice that we do not need to distinguish between the Riemann and the Lebesgue 
integrals for either the i n or the u n .) But, from the monotonicity of the Lebesgue 
integral, we have 

pb pb pb 

sup / in < (L) / / < inf / U„. 

n Ja Ja n Ja 

Thus, (R) f(x)dx = (L) /* / . 

At the price of a few more lines, we can actually show something more. If 
we define £ = sup w £ n and u = inf n u n , then £ and u are bounded, measurable 
functions on [a, b] satisfying £ < f <u. Moreover, 

pb pb pb pb 

sup / £ n < ( L ) / £ < (L) / u < inf / u n . 

n Ja Ja Ja n Ja 

Thus, (L) f^£ = (L) f* u. It follows that £ = / = u a.e. (How?) This gives 

another proof that / is Lebesgue measurable. □ 

While the Lebesgue integral subsumes the proper Riemann integral, we will see 
some differences in the case of the improper Riemann integral. For now we will settle 
for a single example: 


Example 18.17 

The improper Riemann integral (/ R) / 0 °°( sinx/x) dx exists, while the Lebesgue 
integral (L) / 0 °°(sinjc/jc)djt does not. 
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proof. The improper Riemann integral can be written as an alternating series: 

"OO c :„ v OO r nn 


sinx f n7T sin a: 

(/ R) I dx = 7 / dx 

Jo * w _j J{n—\)n % 

= £<-»■-' r ,s,nx ' 

n= 1 «'(" 

oo /»7r 

= D-d "- 1 [ 

Jo 


dx 


(n — l)7r 

I sin x I 


JC -h (« — 1)7T 


■ dx. 


To show that the series converges, we only have to show that the terms tend 
monotonically to zero. But |sinjc|/(jc + (n — l);r) clearly decreases as n increases 
(for x fixed), and 


f 


| sin a: | 


dx < 


1 


0. 


Jo x + (n — 1)7T n - 1 

In order that the Lebesgue integral exist, on the other hand, it is necessary to 
have(L) / 0 °°(| sin x\/x)dx < oo. But, from the Monotone Convergence Theorem, 

oo rnn 


Jo X J(r 


sinx 


(n-l)TT 


oo * pn 

E-/ i» 

ti n7T Jo 


dx 


sinx\dx = oo. □ 


As the last example demonstrates, the difference between the improper Riemann 
integral and the Lebesgue integral is roughly the same as the difference between con- 
ditionally convergent series and absolutely convergent series. The improper Riemann 
integral may exist due to the effect of delicate cancellations, while the Lebesgue in- 
tegral does not permit such issues to arise. In any event, please note that there is no 
such thing as an “improper” Lebesgue integral: We made no special assumptions about 
the boundedness of our integrand, or about the boundedness of the set over which we 
integrate. We will say more about the comparisons between the Riemann (and even the 
Riemann - Stieltjes) integral and the Lebesgue integral later. 

While we have shown that L \ is both a vector space and a lattice, it is easy to see that 
L i is not an algebra under the usual pointwise multiplication of functions. For example, 
if we set f{x) = x~ {/2 for 0 < a: < 1 and f(x) = 0 otherwise, then / is integrable 
while / 2 is not. See Exercise 26 for a variation on this example. 

For the remainder of this book, we will assume that all integrals are Lebesgue 
integrals, unless otherwise specified. With very few exceptions, this should cause no 
problems. If / is a nonnegative, continuous function, for example, then the Riemann 
and Lebesgue integrals of / over any interval either both exist (and are equal) or both 
fail to exist. 


EXERCISES 

t> 23. If (/„ ) is a sequence of Lebesgue integrable functions on [ a , h ], and if f n =4 / 
on [ a, b ], prove that / is integrable and that f* \f n — f\ 0 . 
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24 . Prove that / 0 °° e x dx = / 0 " (1 - (x/n)) n dx = 1. [Hint: For x fixed, 

(1 — (x/n)) n increases to e~ x as n — ► oo.] 

25 . Compute lim,,.^ /J* (1 — (x/n)) n e x/ 2 </*, justifying your calculations. 

26 . Let /(jc) = .r“ I/2 for 0 < x < 1 and /( jt) = 0 otherwise. Let ( r „ ) be an 

enumeration of Q, and let g(x) = 2 ~ n f(x - r„). Show that: 

(a) g € L i and, in particular, g is finite a.e. 

(b) g is discontinuous at every point and is unbounded on every interval; it remains 
so even after modification on an arbitrary set of measure zero. 

(c) g 2 is finite a.e., but g 2 is not integrable on any interval. 

27 . Suppose that E C [ 0, 2zr ] is measurable and that f E x n cos jc dx = 0 for all 

n = 0, 1, 2, Show that m(E) = 0. 

28 . Suppose that /, g , and h are measurable and that / < g < h a.e. If / and h 
are Lebesgue integrable, does it follow that g is Lebesgue integrable? Explain. 

29 . If /, ( f n ) are Lebesgue integrable, and if ( f n ) increases pointwise to /, does it 
follow that //„-*//? Explain. 

30 . Construct a sequence of integrable functions (/„) such that /„ — > 0 a.e., but 
such that f \f n \ J* 0. Construct a sequence of integrable functions (g n ) such that 
fig, , | -> 0, but such that g„ -/> 0 a.e. 

31 . Let (/„), / be integrable. If / |/„ - f\ 0, show that / /„ -> / / and 

f\fn\^f\fl 

32 . Let (/„), / be integrable, and suppose that / \f n — f\ — ► 0. Show that 
f E f n — ► f E f for all measurable sets £, and that / / n + — ► / / + . 

33 . Let / be measurable. Prove that / is Lebesgue integrable if and only if 

> 2 *) < °°- 

34 . Let / be Lebesgue integrable. Given e > 0, show that there is a measurable set 
E with m(E) < oo such that f E \ f\ > f \f\ — e. Moreover, show that E can be 
chosen so that / is bounded on E. 

35 . If / is Lebesgue integrable, prove that the function F(x) = f*^ f is con- 
tinuous. In fact, even more is true: Given e > 0, show that there is a 8 >0 such 
that f E |/| < £ whenever m(E) < 8. [Hint: This is easy if / is bounded; see 
Exercise 34.] 

36 . Suppose that /, (/„) are measurable and uniformly bounded on [a. b]. If 
f n -*■ f on [a, b\, prove that \f„ — / ) — > 0. [Hint: Egorov’s theorem.] 


We are almost ready to define a norm on L\ \ one final observation will come in 
handy. 

Lemma 18.18 For f,geL\,the following are equivalent: 

(i) / = g a.e. 

(ii) f\f~8\=0- 

(iii) f E f = f E g for every measurable set E. 
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proof. That (i) and (ii) are equivalent is easy: From Lemma 18.10, we have 
J 1/ - Si = 0 <=> 1/ - g\ = 0 a.e. «=> f = g a.e. 

Now, for (ii) ==» (iii), note that 

Finally, for (iii) (ii), let £ = {/ - g > 0}. Then, 

J\f-g\ = ftf ~8) + J^8- f) = 0. □ 

We have a natural choice for a norm on L\, namely, ||/||| = / 1/|. In other words, 
d(f, g) = f \f -g\ would appear to be a good guess for a metric on L\. Unfortunately, 
this will not quite work since d(f. g) = 0 only means that f = g a.e. To remedy this, 
we will simply identify functions that are almost everywhere equal. 

Formally, we define an equivalence relation on L \ by taking / ~ g to mean that 
/ = g a.e., and we denote the equivalence classes under ~ by [/]. It is easy to check 
that the collection of all equivalence classes is again a vector space and a lattice under 
the operations a[f ] = [af] for a e R, [/] + [g] = [/ + g), and [/] < [g] whenever 
/ < g a.e. What’s more, || [/] ||i = / 1/| now defines a norm on the collection of all 
equivalence classes. 

For all practical purposes, we need not bother with the formalities outlined above; 
after all, we are typically interested in specific, concrete functions. But, if we want 
to consider L\ as a normed linear space under its natural “norm” (and we do!), then 
we will want to modify our definition of L\. Henceforth, we will consider L\ to be. 
the collection of all equivalence classes of integrable functions under equality almost 
every where. In symbols, we identify / with [/], and we define ||/|h = || [/] |h = / 1/1. 
It is not hard to see that this “new L i” is, indeed, a normed vector space and a normed 
lattice under || - 1| i . In particular, notice that ||/||| < ||g||| whenever |/| < |g| a.e. 


EXERCISE 

37. Check that the operations a[f ] = [af] for a € R, [/] + [g] = [/ + g], 
and [/] < [g] whenever f < g a.e. are well defined, and that the collection of 
equivalence classes is a vector lattice when supplied with this arithmetic. What is 
| [/] | in this lattice? Is it [ |/| ] ? 


Lebesgue’s Dominated Convergence Theorem 

Now that we have a norm on L\, the next question is whether L\ is complete. The first 
step that we will take in this direction is to prove a truly wonderful theorem, one that 
de La Valine Poussin called the “crowning achievement” of Lebesgue’s work. 
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Dominated Convergence Theorem 18.19 Let (/„) be a sequence in L\ and 
suppose that f n -> / pointwise. If the f n are all dominated by a single L\ 
function , that is, if\f n \ < g for all n , where g e L\, then we have f € L\ and 

f fn ^ f f as n oo. 


proof. Since f n -» /, then we must also have |/| < g. Since g e L i , this means 
that / e L\ and/|/| < f g. 

The proof that / /„ -*■ f f consists of a clever application of Fatou’s lemma. 
Notice that each of the sequences (g + f n ) and ( g — fn ) is comprised of nonnegative 
functions, and that g + f n -* g + / and g- f n -> g - f. Now we unleash Fatou: 
First, 


Jg + Jf = J(g + f)< liminf J (g + fn) 

= f g+ liminf f /„; 
J n-*°° J 

thus, / / < lim infn-Mx, / /„. Next, 

Jg-jf = J(g~f)< Jim inf J (g - f„) 

= / g -limsup / /„. 
J n~~yoc J 


Thus, limsup < //. Aha! limsup < / / < liminf^^ / /„, 
and so lim^^oo f f n exists and equals f f. □ 


Note that the “domination condition,’’ |/„ | < g for all n , where g € Lj , is equivalent 
to the requirement that sup rt \f n \ be integrable. 

By discarding countably many sets of measure zero, we may weaken the hypotheses 
of the Dominated Convergence Theorem by requiring only that \f n \ < g a.e. and that 
fn — > f a.e. What’s more, by applying the theorem instead to the sequence ( | f n — f | ), 
noting that \f n - f\ 0 a.e. and \f n - /| < 2g a.e., we actually get a stronger 
conclusion: 


Corollary 18.20 Let (/„) be a sequence in L\. Suppose that f n -* / a.e. and 
that |/„| < g a.e. for all n, where g e L\. Then, f e L\ and f \f n — f\ 0 as 
n -* oo. That /s, ||/ n — /||i -> 0 as n -> oo. 

We will have many opportunities to use the Dominated Convergence Theorem. Here 
are three quick applications that demonstrate its utility (compare the first of these with 
Exercise 35). 

Corollary 18.21 If f e L\, then F(x) = / is continuous . 

proof. If x„ -> x, then /X(-oo,*„j -» fX(-» <o,x] a.e. (Why?) Also, |/x<-oo,,„]| < 

|/| e L\. Thus, by the Dominated Convergence Theorem, / -> /; that 

is, F(jc w ) -> F(x). □ 
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Corollary 18.22 If f e L\, then 



lim lim f f 

a-+-oob-+oo J a 


lim lim f /, 

b-+ooa-+-oo J Q 


and for a, b e R f a < b, 


fib fib fib— e 

/ / = lim / /= lim / 

Ja f-*0- J a+e J a 


/• 


This “continuous parameter” application of the Dominated Convergence Theorem 
is proved by applying the theorem to all possible sequential limits. Along similar lines, 
we can easily derive another comparison with the improper Riemann integral. 


Corollary 18.23 Suppose that f : [ 0, oo) 
for every 0 < a < b < oo, and that 


R is Riemann integrable on [a, b] 


UR) 


r 

Jo 


\f{t)\dt = lim lim 

a— ►O* b-* 30 


f 


I/I 


exists. Then, (/ R) f(t)dt and (L) f£° f both exist and are equal. 


proof. Since/ € TZ[a, b], we know that / e L\[a, b J, that (/?) f = (f-)/j > /. 
and that (R)f*\f\ = (L) /^ |/| for any 0 < a < b < oo. Moreover, since the 
restriction of / to [ a, b ] is measurable for all 0 < a < b < oo, then / is clearly 
measurable on 1 0, oo). An appeal to the Monotone Convergence Theorem now 
shows that / e Z-i [ 0, oo): 

( L ) r |/| = lim f |/| < (IR) P I/I < oo. 

Jo n—*cc J yj n J Q 

It now follows from Corollary 1 8.22 that 

/* oo fib fi oo 

(IR) /= lim lim f = (L) f. 

Jo a -°* fc -* 00 Ja Jo 

In fact. Corollary 18.22 also shows that (L) 1/| = (/ R) / 0 °° |/|. □ 


Just as with the Monotone Convergence Theorem, it is useful to have a version of 
the Dominated Convergence Theorem written in terms of series of functions. 


Theorem 18.24 Let (/„) be a sequence of integrable functions such that 
53^1, / |/nl < oo. Then, /„ converges a.e. to an integrable function. More- 
over, 
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proof. Consider g = I fn I • From the Monotone Convergence Theorem we 
know that f g = f \f n \ < oo. Thus, g is integrable and, most importantly, 
g is finite a.e. What this means is that 

oo oo 

E fn < a - e - 

n — 1 n — 1 

That is, fn converges absolutely a.e. to a finite limit / that satisfies |/| < g 
a.e. And so / is integrable. Of course, \f f\< f \f\ < / g, which proves the first 
assertion of the theorem. 

Notice, too, that the series / fn converges; in fact, it is even absolutely 
summable: 

OO | p I OO p p 

E | J fn \ -E/ l/»l = J 8 <°°- 

To prove the second claim, we apply the Dominated Convergence Theorem 
to the sequence of partial sums. Notice that \Yln=\ fn\sg a.e. and ]T^=i fn -* 
T,™=\ fn a.e. Hence, 

V /* oo p 

-iS.'ZJf-'Ljf" a 

By applying Theorem 18.24 a second time to Y1™=n + i /«> ^e tail of the series, we 
get a much stronger result. If / = /«> then, as N -> oo, 

/ I N I p 00 OO /» 

/-X> = / E /« < E / 1 /- 1 - 0 . 

I rt=l I ^ «=/V+l *=AH-1 J 

That is, the series fn converges in the norm of L\ . In brief, Theorem 18.24 shows 
that 

00 00 

Ells'll < 0° =>• E f" conver 8 es in L \- 

n= 1 n=l 

By Theorem 7.12, this proves: 

Corollary 18.25 L { is complete. 

It also follows from Theorem 18.24 that if / e L i, then the set function p(E) = f E /, 
E e M, is countably additive. But since / is not necessarily nonnegative, p will no 
longer be nonnegative or, indeed, monotone. On the other hand, p is quite well behaved. 
First, since / is integrable, p(E ) e R for any E e M; in fact, 

\P(E)\ = I [ f\ < f |/| < f |/| < oo. 
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Second, if (£„) is a sequence of pairwise disjoint, measurable sets, and if £ = |J£I, E„> 
then , M(£«) = 51^=1 Ie. / * s absolutely convergent with sum n(E) = f E f. In 
this case, we say that p is a signed measure. 

Corollary 18.26 If f e L i, then the map E h+ f E f is a signed measure on M. 

In particular, if (£„) is any sequence of pairwise disjoint, measurable sets, and if 
E = lX=i E„, then 

- tL'-Is 


EXERCISES 

38. If / € £|[0, 1 ], show that x n f(x) 6 £|[0, 1 ] forn = 1,2,... and compute 
lim^oc /„' x n f(x)dx. 

39. Compute /; /2 (i - cos jc dx. Justify your calculations. 

t> 40. Let ( /„), (g„), and g be integrable, and suppose that f„-+f a.e., g n -* g 
a.e., |/„| < a.e., for all n, and that / g„ -* f g. Prove that f e L\ 

and that f /„ -► / /. [Hint: Revise the proof of the Dominated Convergence 
Theorem.] 

41. Let (/„),/ be integrable, and suppose that /„ — ► / a.e. Prove that f |/„ — /| 

0 if and only if / |/„| -*• / |/|. 

42. Let (/„) be a sequence of integrable functions and suppose that |/„| < g a.e., 
for all n, for some integrable function g. Prove that 


/ ( liminf/ n ) < liminf f„< limsup f„< ( limsup/ n ). 

J ' «-<» > J n- oo J J V n— .00 ' 


43. Let / be measurable and finite a.e. on [ 0, 1 ]. 

(a) If f E f = 0 for all measurable £ C [0, 1 ] withm(£) = 1 /2, prove that / = 0 
a.e. on [0, 1 ]. 

(b) If / > 0 a.e., show that inf {f E f- m(E) > 1/2} >0. 

44. Show that lim n _,oo / 0 ' /„ = 0 where f„{x) is: 


(b) 

(c) 

(d) 


n^/x 
1 + n 2 x 2 
n x log x 
1 + n 2 x 2 

n i/2 x 

1 +n 2 * 2 ' 


[Hint: 1 + n 2 * 2 > 2n jc.] 
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45. Find: 


(a) 


f°° sin(e J 

hm / 

1 + n. 


dx (b) lim 


[' n 

lm / - — 

Wo 1 -f 


COS Of 


dx. 


46. FixO <0 <b t and define f n (x) = ae nax 

ooand/ 0 00 (Er=,/n)#Er= 1 /o 00 /- 

47. Compute the following limits, justifying your calculations: 
sin(x/n) 


+ n 2 x 3/2 

—be~ nbx . Show that / 0 °° \f„\ = 


(a) 


(b) lim 

n 


(c) lim 


f n sii 
lim / — - 

im [ 

-°°Jo 

r 

im / 

-“Jo 


+ JC 2 ) 

1 1 + nx 2 


dx 


(1 + Jt 2 )" 

1 sin (x/n) 


dx 


(1 +x/n Y 


dx 


(d) lim 


f°° n 

-oo J a l+n 2 ;t 2 


dx. 


[The answer in (d) depends on whether a > 0, a = 0, or a < 0. How is this reconciled 
by the various convergence theorems?] 

48. Let or, fi € R, and define f(x) = x a sin(x^), 0 < x < 1. For what values of 
a and /} is /: (i) Lebesgue integrable? (ii) Riemann integrable (in the sense that 
lim f _(y f' f(x)dx exists)? 

49. For which a 6 R is xn~ a e~ nx continuous on [ 0, oo)? in L|[ 0, oo)? 

50. Let f(x) = / n ^ e ~" (x n) f° r x 6 R. Is / in L|(R)? continuous on 

R ? differentiable on R ? 


51. Let (/„) be a sequence of measurable functions with |/„| < g for all n, where 
g e L|.If/„ -*■ f a.e„ prove that /„-»•/ almost uniformly. In other words, show 
that the conclusion of Egorov’s theorem remains valid under the hypotheses of the 
Dominated Convergence Theorem. [Hint: In the notation of the proof of Theorem 
17.13, it is enough to show that, for fixed k, some E(n, k) has finite measure. Show 
that E(n,k ) C [2 g > 1/Jt) in this case and argue that m{2g > 1/A:} < oo.] 


Approximation of Integrable Functions 

As a final installment in our discussion of the structure of Lebesgue integrable functions, 
we return to our Basic Construction and uncover a long list of dense subsets of L\ . 

Theorem 18.27 Let f be Lebesgue integrable on R, and let e > 0. Then: 

(i) There is an integrable simple function <p with f \f — <p\ < e. 

(ii) There is a continuous function g : R -*■ R such that g = 0 outside some 
bounded interval and such that f \f — g\ < e. 

(iii) There is an ( integrable ) step function h with f \f - h\ < e. 
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proof. From the Monotone Convergence Theorem we can find a compact in- 
terval [ a, b ] such that / R ^ ( a b ] | /| < e/4. We will build all of our approximating 
functions with support in [ a, b ]; that is, each will be chosen to vanish outside of 
[a,b]. 

(i) There is a sequence of (integrable) simple functions (ip k ) with <p k = 0 
off [a,b\, <Pk~* f on [a, b], and \(p k \ < |/|. It follows from the Dominated 
Convergence Theorem that /* \f - <pk\ -*• 0. Now choose k and <p = <p k with 
fa 1/ - <p\ < e/4 and, hence, / R \f -<p\ < e/2. 

(ii) The function <p is bounded; choose K such that \(p\ < K. Now, from 
Theorem 17.20 (and Exercise 17.45), we can find a continuous function g on 
R, vanishing outside of [a,b], that satisfies |g| < K and m{g ^ <p) < e/(8 K). 
Thus, 

= / l *’- sl<2 * : 'Sf = 5' 

and hence f \f - g\ < f \f - <p\ + f \<p - g\ < 3e/4. 

(iii) From Lemma 12.2 we know that every continuous function on [ a, b ) can 
be uniformly approximated by a step function. In particular, we can find a step 
function h on [a, b] such that ||g - /i||oc < e/[4 (b — a)]. Thus, 

and hence f \f - h\ < f \ f - g\ + f \g - h\ < e. □ 


Corollary 18*28 C[a,b] is dense in L\[a,b]. In fact , given f € L\[a,b] and 
e > 0, there exists a polynomial p with rational coefficients satisfying f* \ f -p\ < 
e. Consequently , L\[a,b] is separable. 


EXERCISES 

> 52. Prove Corollary 18.28. 

t> 53* Prove that L \ (R) is separable. 

54. Given / € L|(R) and e > 0, show that there is an infinitely differentiable 
function \fr e C°°(R) such that yf/ = 0 outside some bounded interval, and such that 
/ 1/ — ^1 < £• [Hint: Review the proof of Theorem 11.12.] 

> 55. Prove the Riemann-Lebesgue lemma: If / is integrable on R, then / (x) cos nx 

is integrable and lim n _oo f ( x ) cos nx dx = 0. The same is true with sin nx in 

place of coshjc. [Hint: First try / = X[a.H-l 

56. Given / € Li(R), define g(x) = /^ /(/) sin(jt t)dt for x e R. Show that g 
is continuous on Rand that g(jc) — ► OasA: — > ±oo; hence, g is uniformly continuous 
on R. 
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> 57 . Prove the following statements, where / : R — > R. 

(a) If / is measurable, then so is g(*) = f(x + t) for any t. 

(b) If / is integrable, then so is g(x) = f(x + 1) and / f = f g. [Hint: This is easy 
if / is a step function.] 

(c) If / is integrable, then lim,_*o I/O) - f(x 4- 01 dx = 0. 

(d) If / is integrable, find lim,^ I/O) - /O + 01 dx. 

58 . Prove the following statements, where / : R — ► R. 

(a) If / is measurable, then so is g(jt) = / O*) for any a. 

(b) If / is integrable, and if a ^ 0, then g(jt) = f(ax) is integrable and / / = 
01 f g. [Hint: This is easy if / is a step function.] 

(c) If / is integrable, then limbec f(ax) dx = 0. 

59 . Let / £ Lj(R) and define 00) = + (l//i)). Show that (p is 

integrable and that f f = f (p. 

60 . 

(a) Show that there is a sequence of polynomials ( P n ) such that P n 0 pointwise 
on [0, 1 ], but with /J P n (x)dx — >► 3. 

(b) Find sup n |P w O)l dx for this sequence of polynomials. 

> 61 . Given / £ Li [ 0, 27T ] and £ > 0, show that there is a trig polynomial T such 
that / Q 2jr |/ — T| < £. [Hint: The proof of Theorem 18.27 (ii) shows that there is 
a continuous function g with g(0) = 0 = g(27r) such that \f — g\ < e/2 . 
By setting gO ± 2nn) = gO) f° r an y n £ N, we may now assume that g £ 
C 2 \ ] 

0 


Notes and Remarks 

While we have chosen an approach to defining the Lebesgue integral that is similar 
in spirit to Lebesgue’s original presentation, there are many other equally viable ap- 
proaches, including the familiar “area under the graph” approach (see Wheeden and 
Zygmund [1977]), the “upper and lower integral” approach (see Apostol [1975]), the 
“limit of step functions” approach (see Chae [1980] or Riesz and Sz.-Nagy [1955]), 
and at least one approach that avoids measure theory altogether (see Van Daele [1990], 
for example). But while several authors take the “simple function” approach, not so 
many bother to check the details. The particulars here are based in part on the painstak- 
ing presentations in the books by Folland [1984] and by Royden [1963]. Once the 
Lebesgue integral has been defined for nonnegative measurable functions, however, 
the differences between the various approaches tend to fade. Bear this in mind should 
you consult one of the references given below. 

The articles by Bliss [1917], Gillman [1993], Goffman [1953b], Hildebrandt [1917], 
and Riesz [1920, 1936, 1949], together with Hawkins [1970], and Lebesgue’s own 
Legons , Lebesgue [1928], include discussions of some of the alternative approaches to 
integration and their history. The books by Folland [1984], Hewitt and Stromberg 
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[1965], Roy den [1963], and Rudin [1966] include various abstractions and generaliza- 
tions on these themes. 

Exercise 44 is taken from De Barra [1974]. Exercises 26 and 47 are taken from 
Folland [1984]. Exercises 38, 39, and 59 are based on exercises in Torchinsky [1988], 
A short proof of Exercise 41 (in a more general setting) is given in Novinger [1992]. 
Exercises 43 and 60 are taken from notes for a course on real analysis offered by W. B. 
Johnson at The Ohio State University in 1974-75. The result stated in Exercise 55 is 
sometimes called Mercer's theorem ; many authors refer to the result in Exercise 56 as 
the Riemann-Lebesgue lemma. Lebesgue’s version of the lemma appears in Lebesgue 
[1906]. 



CHAPTER NINETEEN 


Additional Topics 


We continue our study of Lebesgue measure and integration by pursuing a few additional 
topics of interest. Since we have already been afforded some practice with the basic 
ideas in earlier chapters, the presentation of these extra topics will be streamlined by 
relegating a larger proportion of the details to the exercises. 


Convergence in Measure 

We have now seen several modes or types of convergence for sequences of real-valued 
functions. In this section we will discuss yet another mode of convergence, called con- 
vergence in measure. To motivate this new notion, let’s begin with a simple observation. 

Suppose that (/„) is a sequence of integrable functions that converges in L, to some 
(integrable) function /. Can we claim that (/„) converges pointwise a.e. to /? Well, 
not exactly (see Exercise 1, below), but we can at least make this claim: Given e > 0, 
Chebyshev’s inequality tells us that 

/n{|/„ — /|>£}< \ f \fn~f\^0 

as n -► oo. In other words, (/„) cannot get too far away from / “in measure.” Let’s 
give a name to this new phenomenon. 

Throughout, we let / and (/„) be measurable, real- valued functions defined on some 
common measurable domain D cR.We say that (/„) converges in measure to / on 
the set D if, for each e > 0, we have 

m[x e D : | f„(x) - /( x)| > c} ->• 0 as n -*■ oo. 

Equivalently, (/„) converges in measure to / if and only if, given e > 0, there exists an 
N such that 


m{| /„ — /| > e} < £ for all n > N 

(see Exercise 10). We will occasionally abbreviate these statements by using the sug- 
gestive shorthand f„ /. 

Our goal in this section is to investigate this newest mode of convergence and to 
answer the question: Does convergence in measure tell us anything about pointwise 
convergence? 
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Examples 19.1 

(a) Clearly, (/„) converges in measure to / if and only if (/„ - / ) converges in 
measure to 0. Thus, as with most of the modes of convergence we are familiar 
with, null sequences are again the general case. 

(b) While convergence in measure is implied by convergence in L\, it is by no 

means the same thing. Consider, for example, the sequence /„ = nx«).i/n) on 
D = [ 0, 1 ]. We clearly have /„ 0, as well as f„ -* 0 pointwise, but /„■/*■ 0 

in L\ since / /„ = 1 for each n. In fact, (/„) is not even Cauchy in L\ since 
f\f2« ~ fn\ = 1 for every n. 

(c) Convergence in measure is not implied by pointwise convergence, in general. 
The sequence /„ = X|n, n +i i converges pointwise to 0 on [ 0, oo), for example, 
while m{|/„| > e) = 1 for any 0 < e < 1. Along similar lines, the sequence 
g„(x ) = x/n converges pointwise to 0 on [ 0, oo), but m[g„ > e) = oo for any 
e > 0. 

(d) Nor is pointwise convergence implied by convergence in measure. To see this, 
we will need to construct a somewhat more elaborate example: For each n = 

0, 1 , 2, . . . and each k = 0, 1 2" - 1 we put £*+;>. = [ k2 ~ n , (k + 1 )2~ n 1; in so 

doing, we enumerate the subintervals of [ 0, 1 ] with consecutive dyadic rational 
endpoints as a sequence (Ej ). Now the sequence fj = XE, converges in measure 
to 0 on [0, 1 ] since m{f k+2 « > s) = 2~ n for any 0 < e < 1. But ( fj ) does not 
converge pointwise, or even pointwise a.e„ to 0. Indeed, since each x 6 [0, 1 ] 
is the limit of a sequence of dyadic rationals, we have lim sup ; ^ oc fj(x) = 1 for 
every a:. (Why?) 

The conclusion to be drawn from these few examples is that we have defined a new 
mode of convergence that is strictly different from any that we have seen thus far. Never- 
theless, convergence in measure is more closely related to pointwise convergence than 
you might imagine. As a first step in this direction, consider the following observation 
(recall the discussion following Theorem 17.13, Egorov’s theorem). 

Proposition 19.2. If (/„) converges almost uniformly to f on D, then (/„) con- 
verges a.e. and in measure to f on D. 

proof. The fact that (/„) converges a.e. to / follows from Exercise 17.36; we 
need only show that (/„) converges in measure to /. 

Let e > 0. Since (/„) converges almost uniformly to /, there is a measurable 
subset E of D with m(E) < e such that (/„) converges uniformly to / on D \ E. 
Thus, we can find an index N such that \f„(x) - /(x)| < e for all x € D \ E and 
all n > N. In particular, for any n > N we have 

m[x € D : \f„ - f\ > e] < m{x e D \ E : \f n - f\ > e) + m(E) 

= m(E) < e. 

Hence, (/„) converges in measure to / on D (see Exercise 10). □ 
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By combining this observation with Egorov’s theorem, we arrive at a connection be- 
tween convergence in measure and convergence pointwise a.e. on sets of finite measure 
(Example 19.1 (c) demonstrates the necessity of this extra condition). 

Corollary 193. If ( f n ) converges pointwise a.e. to f on D, where D has finite 
measure , then (/„) also converges in measure to f on D. 


EXERCISES 

t> 1. Find a sequence of integrable functions (/„) such that / |/„| — ► 0 but f n -f* 0 
pointwise a.e. 

2. Find a sequence of integrable functions ( f n ) such that /„ — ► 0 uniformly but 
/l/nl = 1 for all n. 

3. Prove that /„ / if and only if f„ — f — > 0. 

> 4. Fill in the missing details in Example 19.1 (d). 

> 5. Show that m{\f — g\ > £ } < m{\f — h\ > e/2) + m{\h — g\ > e/2). Thus, 
the expression m[\f — g| > e } behaves rather like a metric. 

6. Prove that limits in measure are unique up to equality a.e. That is, if (/„) converges 
in measure to both / and g, then / = g a.e. 

7. If /„ -^ / and g n g, prove that /„ + g„ / + g. 

8. If /„ / and g„ g, does it follow that /„ g n /g? If not, what 

additional hypotheses are needed? 

9. True or false? If /„ /, then \f n \ |/|. 

> 10. Prove that /„ / if and only if, given e > 0, there exists an N such that 

m i\fn — f\ > e) < e for M n > N. 

11. If (f n ) converges in measure to /, show that every subsequence of (/„) con- 
verges in measure to /. 

> 12. We say that (f„) is Cauchy in measure if, given e > 0, there exists an N such 
thatm{|/„ — /ml > s) < e whenever m y n > N. If (/„) converges in measure, show 
that (f„) is necessarily Cauchy in measure. 

t> 13. If (/„) is Cauchy in measure, and if some subsequence (f„ k ) converges in 
measure to /, prove that (/„) converges in measure to /. 


The connection between convergence in measure and pointwise convergence is sup- 
plied by the following fundamental result, due to F. Riesz. 

Theorem 19.4. Let (f„) be a sequence of real-valued measurable functions , all 
defined on a common measurable domain D. If (f n ) is Cauchy in measure , then 
there is a measurable function / : D — ► R such that (f„) converges in measure to 
f. Moreover, there is a subsequence ( f„ k ) of ( /„ ) that converges pointwise a. e. to f. 
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proof. We first establish the “moreover” claim by showing that (/„) has a sub- 
sequence that is pointwise Cauchy. To accomplish this we appeal to an old trick: 
Since (/„) is Cauchy in measure, we may choose a subsequence ( f nk ) satisfying 

m{x € D : | f nM (x) - /„,(*) | > 2~ k ] < 2~ k 

for all k. (How?) In other words, setting E k = [\f„ M - /„,| > 2~ k ], we have 
m(E k ) < 2~ k for all k. 

Now, since m i E k) < oo, the Borel-Cantelli lemma, Corollary 16.24, tells 
us that the set 

OO 00 

E = lim sup E* = p'j Ej 
i=i j=k 

has measure zero. Notice that for any x £ E we have x <£ |J %k Ej f° r some k 
sufficiently large, and hence 

I fn J+1 (x) - fnjix) | < 2~ j for all j > k. (19.1) 

In particular, we must have £ . ( f nj+l (x ) - f n . (jc)) < oo. Thus, for any x £ £, the 
limit 

oo 

f{x) = lim f nj i x) = f„,(x) + (fn i+ ,(x) - fnjix)) 

J ~*°° ]= 1 

exists. If we define f(x) — 0 for x e E, then we have defined a measurable 
function / for which /„*(*) -> f(x) for any x <£ E; that is, f nk -> / a.e. 

All that remains is to check that /„ -* / in measure. To this end, first notice 
that for x g E we may write 

oo 

fix) ~ fn„ix) = (/«,+,(*) “ fnjix)), 

j= k 

and hence, from equation (19.1), for any x <£ (J JLk Ej we have 

00 oo 

I fix) - fn k ix) I < J2 | fn i+l ix) - f„jix)\ < J^2^ = 2~ k+X . 
j—k j—k 

(In other words, ( f nk ) converges almost uniformly to /.) In particular, we must 
have 

/ oo \ oo 

m{\f -f nk \> 2~ k+ '} < m ( |J Ej ) < = 2~ k+l . 

\j—k / j= k 

Thus, ( f nk ) converges in measure to /. Since (/„) is Cauchy in measure, this easily 
implies that (/„) itself converges in measure to /. (See Exercise 13.) □ 

It follows from Riesz’s theorem that the collection of measurable, real-valued func- 
tions on D is closed under convergence in measure; that is, if (/„) is a sequence 
of measurable functions and if, for some function / on D, we have e D : 

!/«(*) — /OOI > s} -> 0 (n -> oo) for every e > 0, then / is measurable. (Why?) 
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Combining Riesz’s theorem with our very first observation on convergence in mea- 
sure yields: 

Corollary 19*5. If (/„ ) is a sequence of integrable functions that converges in L \ 
to an integrable function /, then some subsequence of (/„) converges a.e. to f. 


EXERCISES 

> 14 . Assuming that m(D) < oo, prove that ( f n ) converges in measure to / on 
D if and only if every subsequence of (/„) has a further subsequence that con- 
verges pointwise a.e. to / on D. Is this still true without the requirement that 
m(D) < oo? 

15 . If f n — > / in L|, prove that there is a subsequence of (/„) that converges 
almost uniformly to /. [Hint: By passing to a subsequence we may suppose that 
f n — ► / a.e., and that / |/„ — /I < 2~ n for all n. Now repeat the proof of Egorov’s 
theorem (Theorem 17.13), arguing that the set £(1,/:) has finite measure in this 
case.] 

> 16 . Over a set of finite measure we can actually describe convergence in measure 
in terms of a metric. For example, consider 

d(f,g)=f niin{|/(-0 — g(x)|, \)dx, 

Jo 

where / and g are measurable, real-valued functions on [ 0, 1 ]. 

(a) Check that d(f g) is a pseudometric, with d(f , g) = 0 if and only if / = g 
a.e. [Hint: p( jc, y) = min{|;t — y|, 1 } defines a metric on R; see Exercise 3.5.] 

(b) Prove that ( f n ) converges in measure to / on [ 0, 1 ] if and only if d(f nj f ) -> 0 
as n — ► oo. 

(c) Prove that (/„) is d-C auchy if and only if ( f n ) is Cauchy in measure. 

17 . We denote the collection of all (equivalence classes of) measurable, finite a.e., 
extended real-valued functions on [0, 1 ] by Lo[0 , 1 ], where we identify any two 
functions that agree a.e. (just as we do for L,[0, 1 ]). Prove that (Lo[0> 1 ],d) 
is a complete metric space, where d(fg) is the expression defined in Exer- 
cise 16. 

18 . There are a wide variety of (pseudo)metrics that describe convergence in mea- 
sure. For example, let 


*(/.*) = 



i/-gi 
1 + 1/ -si 


and verify that (/„) converges in measure to / on [ 0, 1 ] if and only if r (/„, f)-* 0 
as n —*■ oo. [Hint: The metric cr{x, y) = | j: — y|/(l + |x — y|) is equivalent to the 
metric p of Exercise 16 (a).] 

19. In sharp contrast to convergence in measure, the topology of convergence point- 
wise a.e. cannot, in general, be described by a metric. (And this is precisely why 
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pointwise a.e. convergence is often problematic.) To see this, prove that: 

(a) There is a sequence of measurable functions (/„) on [0, 1 ] that fails to con- 
verge pointwise a.e. to 0, but such that every subsequence of (/„) has a further 
subsequence that does converge pointwise a.e. to 0. 

(b) There is no metric p on Lo[0, 1 ] satisfying p(/„, /) — ► Oifandonlyif /„ -* / 
a.e. 

20. Note that while convergence in measure can sometimes be described by a met- 
ric, and while the collection of measurable functions is clearly a vector space, the 
topology of convergence in measure is not always “compatible” with the vector space 
operations. To see this, find a measurable, real-valued function / on [ 0, oo), for ex- 
ample, such that k n f 0 in measure no matter how a sequence of scalars k„ -► 0 
is chosen. This means that the topology of convergence in measure on [ 0, oo) cannot 
be described by a norm. Why? 

t> 21. Prove that Fatou’s lemma holds for convergence in measure: If ( f n ) is a se- 
quence of nonnegative measurable functions and /„ /, show that f > 0 

a.e. and that f f < liminf n _> 00 f /„. [Hint: First pass to a subsequence ( f „ k ) with 
•im*_ 00 / f ni = lim inf,,-,* / /„.] 

▻ 22. Let (/„) be a sequence of measurable functions with |/„| < g, for all n, where 
g 6 L\. If (/„) converges to / in measure, prove that |/| < g a.e. and that (/„) 
converges to / in L i . In other words, prove that the Dominated Convergence Theorem 
holds for convergence in measure. 


The L p Spaces 


In this section we extend our discussion of the space of integrable functions L\ by 
introducing an entire scale of spaces L p , 1 < p < oo. The so-called Lebesgue spaces 
L p are the “continuous” analogues of the familiar sequence spaces l p . Just as with the 
t p spaces, we will find that the case p = oo demands special treatment, and so we begin 
by focusing on the range 1 < p < oo. 

Given a measurable subset £ of R (with m(£) > 0) and a real number 1 < p < oo, 
we define the space L P (E) to be the collection of all equivalence classes, under equality 
a.e., of measurable functions / : £ -*■ R for which \ f\ p € £|(£); that is, 

fe \f\ p < oo. 

We define a norm on L p (E) by setting 


ll/ll P = 



(19.2) 


for / € L P (E). This expression is clearly well defined; in other words, if f = g a.e., 
then || y || p = ||g||p. Of course, we will want to check that L P {E) is, indeed, a vector 
space and that this expression is actually a norm. 
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Please recall that we have already encountered a relative of the space Lz(E) in 
Chapter Fifteen. In that chapter we used the symbol Li to denote, essentially, the space 
Li[-n, 7t ] (except that we divided the expression in equation (19.2) by y/n and, of 
course, we spoke of Riemann integrable functions). For the moment we will ignore 
this earlier meeting, but we will have more to say about these close cousins later in the 
chapter. 

Just as in the case of L\ , we will turn a blind eye to equivalence classes and simply 
speak of the elements of £ p (£) as functions, but with the added proviso that statements 
concerning L P (E) functions are at best valid almost everywhere. As an example of this, 
please note that if / € L p (£), then / is finite a.e. on £; in other words, / is allowed to 
take on infinite values at a “few” points. 

And, again as in the case of L\, the underlying set E typically has little bearing on 
the properties of £ p (£) that are of interest to us. If the discussion at hand does not 
depend on the set £, we will simply write L p to denote a typical space L p (£). For the 
most part, we will consider only the spaces L p [ 0, 1 ], L p [0, oo), and £ P (R). There is 
no harmhere in assuming that the unadorned symbols L p denote the space L P (R). 

As we have already witnessed with the l p - norm, the proof that equation (19.2) 
defines a norm will require a few elementary inequalities. Each of these should look 
very familiar (if not, you may want to review Lemmas 3.5-3.7 and Theorem 3.8). In 
what follows, we will concentrate on the range 1 < p < oo (since we already know that 
L\ is a normed space). To begin, notice that we certainly have ||/|| p = 0 if and only if 
/ = 0 a.e. (Why?) It is also clear that if / € L p , then cf € L p for any scalar c € R; 
moreover, ||c/|| p = |c| ||/|| p . As with l p , the real battle is with the triangle inequality. 
To strike a first blow in this battle, let’s check that L p is a vector space. 

Lemma 19.6. Let 1 < p < oo. If f, g € L p , then f + g e L p and ||/ + g\\ p < 

2'WIIp + llgllp)- Consequently, L p is a vector space. 

proof. The result follows from Lemma 3.5. Given /, g e L p , we have 

I m + g(x)\ p < (l/wi + lg(x)|) p < 2 p (\f(x)\ p + Ig(x)l') a.e. (19.3) 

and hence 

f I m + g(x)\ p dx< f(\f(x)\ + \g(x)\) p dx 

<2 p I (| f(x)\ p + \g(x)\ p )dx < oo. □ 

Please note the presence of an “a.e.” in equation (19.3). Since we only know that / 
and g are finite a.e. (in fact, / and g may only be defined a.e.), we are only allowed to 
apply the inequality of Lemma 3.5 for a.e. x. 

Next, we have the L p version of Holder’s inequality. 

Holder’s Inequality 19.7. Let 1 < p < oo, and let q be defined by 
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If f € L P (E) and g 6 L q (E), then fg e L\(E) and 

f fg\ < f \fg\ < 11/11,11*11,. 

J E I JE 

proof. We may suppose that ||/|| p > 0 and ||g||, > 0 (why?); hence, we need 
to prove that 

/JZL.JiL < 
hub 11 * 11 , ~ 

We now appeal to Young’s inequality (Lemma 3.6): For a.e. x we have 

1/001 1*001 < 1 I f(xW 1 !*(*)!« 

ll/ll, ' 11*11, " P ' \\f\\ P P q ' 11*11? ’ 

and so integration over E yields 

[JIL.JiL < ( ,/,p + i._L f | Sl « 

JtsUb 11*11, " P WfVpJe q 11*11? h 

- 1 + 1-1. □ 

p q 

When p = q = 2, the conclusion of Holder’s inequality reads 

which is an inequality that is usually referred to as the Cauchy-Schwarz inequality 
(and one that we put to good use in Chapter Fifteen). 

Finally, we are ready for the L p version of Minkowski's inequality (i.e., the triangle 
inequality). As an intermediate step, we isolate a key ingredient in the proof that is 
of independent interest; the proof of the following lemma is left as an exercise (Exer- 
cise 23). 

Lemma 19.8. Let 1 < p < oo and let q he defined by p~ l +q~' = 1. If f e L p , 
then |/l p_1 6 L q and 

in/r‘ n, = n/iir‘- 

Minkowski’s Inequality 19.9. Let 1 < p < oo and let f,ge L p . Then, f + g e 
L p and ||/ + *|| p < ||/H P + H*ll p . Consequently, || • || p is a norm. 

proof. The theorem is clearly true when p = 1 , so we will suppose that p > I 
here. 

The fact that f + g e L p follows from Lemma 19.6. To prove the triangle 
inequality, we next apply Holder’s inequality. If q is defined by p~ l + q~ x = 1, 
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that is, if q — ( p/p — 1), then 

\\f + g\\ P p = I \f + g\ p = I \f + 8\-\f + g\ p -' 

< f \f\-\f+g\ p -'+ 1 isi-iz+^r 1 

<(/ lfl P )'\l\f + 8^)' /q 

+ (J I*!") ^ (/ \f + 8\ (p -' )q ^ 

= \\f\\p\\f + g\\ p -' + \\g\\p\\f + g\\ p -'. 
Dividing by 11/ -f g\\p~\ the result follows. □ 


EXERCISES 

> 23. Prove Lemma 19.8. 

24. Show that equality holds in Holder’s inequality if and only if A\f\ p ~ l — Z?|g| 
for some nonnegative constants A and B , not both zero, if and only if C | / 1 p = D\g\ q 
for some nonnegative constants C and D, not both zero. 

25. When does equality hold in Minkowski’s inequality? 

26. If m(E) < oo and / e L q (E ), show that \\f\\ p < (m(E)) l/p ~ l/q \\f\\ q for 
l < p < q < oo. Thus, as sets, L q (E) C L p {E) whenever m(E) < oo. [Hint: 
Holder’s inequality.] In particular, if E = [ 0, 1 ], notice that the L^-norms increase 
with p\ that is, \\f\\ p < \\f\\ q for 1 < p < q < oo. 

27. Given 1 < p < q < oo, show that L p (R ) ^ L q ( E) by showing that neither 
containment holds. That is, construct functions / € L p (W)\L q (M)and g € L q (R)\ 
L P (R). 

28. Given 1 < p, q, r < oo with r _l = p~ l + q~ x , prove the following gen- 
eralization of Holder’s inequality: ||/g|| r < \\f\\ p \\g\\ q whenever / € L p and 
g e L q . 

29. Given 1 < p, q < oo and 0 < a < 1, let r = ap + (1 — a)^. Prove 
Liapounov’s inequality: \\f\\ r r < ||/||“ p ll/ll^' 

t> 30. If (/„) converges to / in L p , prove that (/„) converges in measure to /. Thus, 
some subsequence of (f n ) converges a.e. to /. If (/„) is Cauchy in L p , prove that 
(/„) is Cauchy in measure. 

31. If (f n ) converges to / in L p , does (| f n \ p ) converge to \f\ p in L\1 in measure? 

32. Given 1 < p < oo, construct /, g € L P (R) such that /g ^ L p (R). Thus, 
although is a vector space and a lattice under the usual pointwise a.e. operations 
on functions, it is not typically an algebra of functions. 
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> 33. If / and g are disjointly supported elements of L p , that is, if fg = 0 a.e., show 
that ||/ + *||£ = ll/ll' + ||*ll£. 

34. Let (i4„) be a sequence of disjoint measurable sets. Show that o„Xa. con- 
verges in L p if and only if \a„\ p m(A„ ) < oo. 

35. Show that the collection of integrable simple functions is dense in L p , for any 
1 < p < oo. [Hint: Repeat the proof of Theorem 1 8.27 (i).J 

36. For any 1 < p < oo, prove that the space L P (R) is separable. Conclude that 
L p [0, 1 ] is also separable. 

37. Given 1 < p < oo, / € L p [ 0, 1 ], and e > 0, show that there is a function 

g € C[ 0, 1 ] such that \\f — g|| p < e. Conclude that C[ 0, 1 ] is a dense subspace of 
L p [ 0, 1 ] (where C[0, 1 ] is embedded into L p [0, 1 ] in the obvious way: / [/]). 

[Hint: Theorem 18.27 (ii).] 


We could now fashion a proof that L p is, in fact, a complete normed space following 
Theorem 18.24. Instead, though, we present a proof that uses a little of the machinery 
that we developed in the previous section. 

Theorem 19.10. L p is complete for any 1 < p < oo. 

proof. Fix 1 < p < oo, and let (/„) be a Cauchy sequence in L p . In particular, 
(/„) is a bounded sequence in L p ; that is, the sequence / |/„| p is bounded. 

Now, (/„) is also Cauchy in measure (Exercise 30). Thus, by Theorem 19.4, 
there is a subsequence (/„, ) that converges a.e. to some measurable /. To complete 
the proof, then, it is enough to show that f e L p and that (/„,) converges to / 
in Lp-norm. But, since (|/„,| p ) converges a.e. to |/| p , we may appeal to Fatou’s 
lemma to conclude that 

J \f\ p < lirn mf J |/ n ,| p < sup J |/„| p < oo. 

Hence, / e L p . The proof that (/„,) converges to / in Lp-norm follows similar 
lines: The sequence (|/„ ; - fn t \ p )JL\ converges a.e. to \f - /„J P , and so, given 
e > 0, 

j\f- fnX < Ummf j \fnj - fnX < S, 
provided that k is sufficiently large. (Why?) □ 


EXERCISES 

38. Suppose that (/„) is in L p> 1 < p < oo, with ||/„||p < 1 and /«-*•/ a.e. 
Prove that / € L p and that ||/||p < 1. 

39. Let /, /„ e Lp, 1 < p < oo, and suppose that f n ~*f a.e. Show that 
Wfn ~ f\\ P ^ 0 if and only if ||/Jp ^ \\f\\ p . [Hint: First note that 2 P (|/„| P + 
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\f\ p ) — \fn — f\ p > 0 a.e., and then apply Fatou’s lemma.] Note that the result also 
holds if “a.e ” is replaced by “in measure.” 

40 . Fori < p < ooand a,b > 0, show that a p +b p < ( a-\-b) p < 2 p ~ { (a p +b p ), 
and that the reverse inequalities hold when 0 < p < 1 . [Hint: Consider the function 
(p{x) = (1 + x) p /(l + x p ) for 0 < x < 1.] 

41 . It makes perfect sense to consider the spaces L p for 0 < p < 1, too. In this 
range, expression (19.3) no longer defines a norm; nevertheless, L p is a complete 
metric linear space. For 0 < p < 1 , prove that: 

(a) L p is a vector space. 

(b) The expression d(f , g) = f \f — g\ p defines a complete, translation-invariant 
metric on L p . 

(c) Let p~ l + q~ { = 1 (note that q < 0!). If 0 < / € L p and if g > 0 satisfies 
o < f 8 q < oo, then f fg > (/ f p )' ,p (f g q )' lq ■ 

(d) If /, g G L„ with /, g > 0, then ||/ + g|| p > ||/|| p + ||g|| p . 

(e) If f,g € L p , then ||/ + j || p < 2 I ^(||/|| P + ||«|| p ). 


At the beginning of this section, the L p spaces were advertised as analogues of the i p 
spaces. As such, the space L^, whatever it is, should look like a collection of bounded 
functions. But if L p functions are allowed to take on infinite values at a “few” points, 
how are we to make sense of the word “bounded”? The answer is that a “function” in 
Loo is one that is equivalent to a bounded measurable function; that is, it is equal a.e. 
to a bounded function. 

We say that a measurable function / : E R is essentially bounded (on E) if 
there exists some constant 0 < A < oo such that |/| < A a.e.; that is, m{x e E : 
\f(x)\ > A] = 0. Now there are many choices of the constant A , for if |/| < A a.e., 
then |/| < A + 1 a.e., too. The smallest constant that works here is called the essential 
supremum of / (over L), which is written 

ess. sup |/(*)| = inf { A > 0 : m{x € E : |/| > A] — 0 }. (19.4) 

xeE 

Please note that the essential supremum of / would be unchanged even if we were to 
alter / on a null set. In other words, if / and g are essentially bounded, and if f = g 
a.e. on £, then we have ess.sup £ |/| = ess.sup £ |g|. 

The essential supremum is not as strange a beast as you might imagine; it is really 
quite natural to consider almost everywhere boundedness. By way of an example, notice 
that if / : [0, 1 ] M is measurable and essentially bounded, and if Af c [0, 1 ] is a 
null set, then 

f\f\ = f I/I < sup|/(jc)|. 

Jo J[0,\]\N xiN 

f I/I < inf (sup 1/001 •' m(N) = 0 


Thus, 
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The right-hand side of this last inequality is precisely the essential supremum of / over 
[0, 1 ] (see Exercise 45), and it provides a somewhat better upper estimate for / 0 ! |/| 
than the uniform norm sup 0st <, \f(x)\ (see Exercise 44). 

Finally, we denote the collection of all equivalence classes, under equality a.e., of 
essentially bounded measurable functions on E by L^iE), and we define 

II /Hoc = ess.sup \f(x)\ (19.5) 

xtE 

for / e Loo(E)- By our earlier remarks, this expression is well defined on equivalence 
classes; in other words, if / = g a.e., then ||/||oc = llglloo- Just as with L p> the symbols 
Loc denote a typical space L^E). 

As always, we will want to check that L is a vector space and that || • ||oo is a 
legitimate norm. Moreover, since this is nearly the same expression that we have been 
using for the uniform norm, we will want to check that this new norm coincides with the 
more familiar sup norm in certain cases. Most of these details will be left as exercises. 
To avoid potential confusion, though, throughout the remainder of this section the 
expression IHloo will always denote the essential supremum norm (19.4). 


EXERCISES 

> 42. Let /:£-^K be measurable and essentially bounded, and let A = 
ess.sup x€/? |/(jc)|. Prove that: 

(a) 0 < A < oo and |/| < A a.e. 

(b) / = 0 a.e. if and only if A = 0. 

(c) If 0 < A f < A, then m{\f\ > A'} ^ 0. 

Thus, |/| < H/Hoc a.e., where \\f\\oo is the L ^- norm of / and ||/||oo is the 
smallest constant with this property. 

> 43. If / eL^ is m{|/| = ll/lloo) > 0? Is {|/| = H/M # 0? Explain. 

44. If / : E — ► R is a measurable, (everywhere) bounded function, prove that 
ess.sup^ |/| < sup E |/|. Give an example showing that strict inequality can occur. 

>45. If / : E — ► K is essentially bounded, show that 


ess.sup | f(x)\ = inf { sup \f(x)\ : m(N) = 0> . 

xeE |*€E\V j 

Moreover, show that this infimum is actually attained; that is, prove that there is a 
null set N such that ess.sup £ |/| = sup^ |/|. 

>46. Let / € C[0, 1 ] and 0 < A < oo. If |/(jc)| < A for a.e. x e [0, 1 ], prove 
that, in fact, |/(jc)| < A for all x e [0, 1 ]. Conclude that 

sup |/(jc)| = ess.sup | f(x)\ 

0<.v<l 0<jr< I 

in this case. In other words, ||/||c|o.ii = ll/llz. x |o,i )• 
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o 47. If /, g : E -► R are essentially bounded, show that / + g is essentially 
bounded and that ||/ + g\\oo < ll/lloo + llglloo where || • !!<*> denotes the Loo-norm. 
[Hint: It is enough to show that \f 4- g\ < ||/||oo 4* llglloo a.e.] Conclude that L <*> is 
a normed vector space. 

48. If /, g € Loo, show that fg € L^ and ||/g||oo < ll/lloo Halloo- Conclude that 
Loo is a normed algebra. [Compare this with Exercise 32.] Is Lqo a normed lattice 
(under the usual pointwise a.e. ordering)? 

> 49. Prove that L 0 o(R) is not separable. More generally, if m(E) > 0, then 
Loo(E) is not separable. [Hint: If A and B are disjoint, notice that || Xa — Xslloo 
= 1 .] 

50. Show that the collection of all simple functions is dense in Loo* [Hint: Recall the 
Basic Construction, Theorem 17.14.] If m(E) < oo, show that the integrable simple 
functions are dense in L^E). Is this true without the restriction that m(E) < oo? 
Explain. 

t> 51. If m(E) < oo, show that, as sets, Loo(£) C L p ( R), for any 1 < p < oo, and 
that || /||p < (m(£)) l-1/p ||/|| 0o forany / 6 £«>(£). In particular, if/ € Z-oJO, 1 ], 
then H/ll, < \\f\\ p < H/lloo for any 1 < p < oo. 

52. If / e Loo[0, 1 ], show that ||/|| p = ll/lloo- [Hint: First note that 

lim p _oo H/llp exists by Exercise 5 1 . Next, consider the integral of | / 1 p over the set 
(l/l> ll/lloo -*}•] 

53. If m(E) < oo, show that L^(E) is a dense subspace of L P (E ), for any 1 < 
p < oo. 

> 54. Given/ € L p , 1 < p < oo.andg € L^, provethat fg € L p and that ||/g|| p < 
H/ll p ||g||oo- [Note that for p = 1 this gives Holder’s inequality (for q = oo).] 

55. Let /„ -*■ f in L p , 1 < p < oo, and let (g„) be a sequence in L x with 
llgnlloo < 1 and g„ g a.e. Show that f„g„ fg in L p . 


Finally, a word or two about convergence in Loo ■ We begin with a simple observation: 
Convergence in is the same as uniform a.e. convergence. 

Lemma 19.11. //"(/„) converges to 0 in Loc(E), then there is a null set A C E 
such that (/„) converges uniformly to 0 on E \ A. 

proof. For each n, there is a null set A„ such that |/„(x)| < ||/„||oo for all 
x e E \ A„. If we set A = [J n A„, then A is a null set and 

sup \f„(x)\ < sup |/„(x)| = H/Jloo ->• 0 as n -> oo. □ 

xeE\A xeE\A m 


Theorem 19.12. L^ is complete . 

proof. Let (/„) be a Cauchy sequence in Loo(E). Then, there is a null set A such 
that (/„) is uniformly Cauchy on E \ A. Indeed, for each m and n, we may choose 
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a null set A m ,„ such that \f m (x)-f n (x)\ < \\f m -fnWoo for all a: g E\A m , n . Putting 
A = (J m n A mM does the trick. Thus, (/„) converges uniformly on E \ A. If we 
define f(x) = lim^oo f n (x ) for x e E \ A and f(x) = 0 for x g A, then / is a 
bounded measurable function. All that remains is to check that \\f n - /H*, -* 0. 
But, since A is a null set, 

ll/n-/lloo< SUp \f n (x)-f(x)\ 
xgE\A 

(see Exercise 45), and the right-hand side tends to 0 as n -* oo, since (/ n ) 
converges uniformly to / on E \ A. □ 


EXERCISE 

56. Under the obvious inclusion (i.e., / h* [/]), show that C[0, 1 ] is a closed 
subspace of 0, 1 ]. 


Approximation of L p Functions 

In analogy with Theorem 18.27, we next discuss the approximation of elements of L p 
by simpler functions. As with Theorem 18.27, most of the work here is done by the 
Basic Construction. 

Theorem 19.13. Let 1 < p < oo, let f g L p (R), and let e > 0. Then: 

(i) There is an integrable simple function <p with \\f — cp\\ p < e. 

(ii) There is a continuous function g : R -» M such that g = 0 outside some 
bounded interval and such that \\f — g\\ p < e. 

(iii) There is an ( integrable ) step function h with \\f - h\\ p < e . 

proof. The key observation here is that \f\ p e L\(R). Thus, from the 
Monotone Convergence Theorem, we can find a compact interval [a, b] such 
that f R \ [ab] \f\ p < (e/ 4) p . We will build all of our approximating func- 
tions with support in [a, b]\ that is, each will be chosen to vanish outside of 
[a,b]. 

(i) There is a sequence of (integrable) simple functions (<p*) with = 0 
off [a, b] 9 <pk — ► / on [ a , fe], and \<pt\ < |/|. Using equation (19.3), we have 
1/ - <Pk\ p < 2 p (\f\ p + \<pk\ p ) < 2 p+l \f\ p , and it now follows from the Dominated 
Convergence Theorem that /J 7 1/ — <Pk\ p -> 0. Finally, choose k and <p — (pk with 
fa 1/ - V\ p < (e/4 ) p and, hence, / R | / - v \p < 2(e/4 y < (e/ 2 y. 

(ii) <p is bounded; choose K such that \(p\ < K. Now, from Theorem 17.20 
(and Exercise 45), we can find a continuous function g on R, vanishing outside 
of [a,b], such that \g\ < K and m{g j=- (p) < (e/8 K) p . Then, 

f\*-sr = = ©'• 
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and hence||/ - g|| p < ||/ - <p|| p + \\<p - g|| p < 2>e/A. 

The proof of (iii) is left as an exercise. □ 

Corollary 19.14. The integrable simple functions are dense in L p for 1 < p < oo. 

Corollary 19.15. C[a,b]is dense in L p [a, b ]for 1 < p < oo. Hence, £«,[ a, b ] 
is dense in L p [a,b]for 1 < p < oo. 

Corollary 19.14 and the first statement in Corollary 19.15 do not hold for p = oo. 
However, as an almost immediate consequence of the Basic Construction, it is true that 
the simple functions are dense in and that the integrable simple functions are dense 
in Loo(£) whenever m(E) < oo (see Exercise 50). 


EXERCISES 

57. Prove Theorem 1 9. 1 3 (iii). 

58. Prove Proposition 19.16. 

59. Fix 1 < p < oo. Prove or disprove: When considered as a subset of L p [a,b], 
the Riemann integrable functions H[ a, b ] are dense in L p [ a, b ]. Does your answer 
depend on pi [Hint: Recall that the elements of V,[a, b ] are bounded measurable 
functions.] 

t>60. Fix 1 < p < oo, / e L p [a,b ], and £ > 0. Show that there is an algebraic 
polynomial Q and a trig polynomial T such that || / — Q || p < e and || / — T || p < e. 

61. Prove that L p ( R) and L p [ 0, 1 ] are separable for any 1 < p < oo. Try to give 
at least two different proofs. 

62. Let 1 < p < oo, and let / € L P (R). Given e > 0, show that there is a S > 0 
such that ||/ X/tllp < £ whenever m(i4) < S. [Hint: This is easy if / is bounded.] 
Does this result hold for p = oo? Explain. 

63. Fix 1 < p < oo. Given h € R, define a map 7* on L P (R) by setting 
( W ))(jc) = f(x + h ) for / e L„(R) and x e R. 

(a) Show that T„(f ) 6 L p ( R) and that || W )||„ = ||/|| p . 

(b) Show that the map / *-*■ T/,(f) is linear. Conclude that 7* is an isometry on 
L P (R) for any h. 

(c) Prove that lim*_oll/ — 7* /lip = 0- [Hint: This is easy if / is uniformly 
continuous.] 

(d) Does lim/,_ oc ||/ — T h f\\ p exist? If so, compute it. 

64. Let 1 < p < oo, let / € L p , and let g € L q , where p -1 + q ~ 1 = 1. 

(a) Show that h (x) = f(t)g(x + 1) dt defines a bounded continuous function 
on R satisfying ||/t||oo < ll/ll P llgll,- [Hint: Exercise 63 (c).] 

(b) If one of f or g is differentiable, show that h is also differentiable and find a 
formula for h '(*) (in terms of either / ' or g '). 



352 


Additional Topics 


More on Fourier Series 

With the Lebesgue integral and the L p spaces now at our disposal, we take another 
brief look at Fourier series with an eye toward improving, or at least restating, a few 
key results from Chapter Fifteen. 

Following our earlier notation, we define the Fourier series of a 2tt -periodic function 
/ : R -> E by 


— 4- ^2 ( ak cos ^ x + ^ s ^ n k x ) * 

2 k=\ 

where the Fourier coefficients a k and b k are given by 

* if* 

a k = — / f(t)cosktdt and b k = — / f(t)sinktdt. (19.6) 

TC J _ n TC J _ JT 

However, we now require that / be Lebesgue integrable on [ — 7r, jt] and we interpret 
each of the integrals in equation (19.6) as a Lebesgue integral. 

Virtually every observation, and every calculation, that we made in Chapter Fifteen 
will remain valid in this new setting with only a few minor adjustments here and there. A 
rather obvious modification is that the Riemann integral should everywhere be replaced 
by the Lebesgue integral, thus providing us with a larger class of functions that admit 
representation by Fourier series. 

On the other hand, there is one major difference here: Because we have assumed 
that Riemann integrable functions are bounded, we know that / 2 is Riemann integrable 
whenever / is; in other words, in the context of Chapter Fifteen, the collection L 2 [ a, b ] 
is simply a new name for the collection 7Z[ a, b ]. But we make no such boundedness as- 
sumption on Lebesgue integrable functions. In particular, the integrability of / now tells 
us nothing about the integrability of / 2 . The Lebesgue spaces L 2 [«, b] and L\[a, b ], 
as presented in this chapter, are quite different from their Chapter Fifteen cousins. Thus, 
the “L 2 -theory” developed in Chapter Fifteen is especially meaningful in the context of 
the Lebesgue integral: Isolating the collection of Lebesgue square-integrable functions 
is not only a convenience but a necessity. 

As an example of this subtle difference, we first note that Observation 15.1 (b) (as 
restated in Observation 15.1 (d)) remains true for the Lebesgue integral provided we 
assume that / 2 is Lebesgue integrable on [—n, n). In what follows, it will again be 
helpful to renormalize the L 2 -norm by setting 

/\f* \ 1/2 

ll/lb = {-] \m\ 2 dxj . (19.7) 

We will take this expression to be the norm on L 2 [-7r, tt] throughout the remainder 
of the section. It is easy to see that this normalization has no effect on the results 
for L 2 [-7r, n] that were developed earlier in this chapter. In particular, Holder’s in- 
equality, Minkowski’s inequality, and Theorem 19.10 all hold in this new setting (see 
Exercises 65-67). 
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Proposition 19*16. If f € then s n (f ) is the nearest point to f out of 

T n relative to the Li-norm. In other words , 

inf 11/ — T\\i = 11/ — s n (f )|| 2 - 

TeT n 

Moreover 

11/ - s„(f )||? = - f f(x) 2 (a* + b\) 

71 1 k=\ 

= ll/ll! -IM/)lll. (19.8) 

proof. The proof of the proposition is identical to that given for Observation 
1 5. 1 (b) once we justify the existence of the integrals used in that proof. Of course, 
if / € Z- 2 [— tr, n\, then f 2 is integrable on [— jr, n\. Next, notice that since a trig 
polynomial T € T n is (continuous and) bounded, it follows that T 2 and, hence, 

1/ - T) 2 are Lebesgue integrable on [— jr, n ]. Finally, if both / and T are in 
Z. 2 (— jt, 7t ], then Holder’s inequality assures us that the product fT is integrable 
on [-jr, n\. Thus, the various integrals used in the proof of Observation 15.1 (b) 
exist. □ 


As a consequence of Proposition 1 9. 1 6, it is immediate that Bessel’s inequality ( 1 5.3) 
holds for / e Z-21-JT. Jr]. That is, if / e Z^I-tt. jt], then the Fourier coefficients of / 
are square-summable and satisfy 

y + EK 2 + ^)<; f Hxfdx. (19.9) 

2 M 71 

Hence, Riemann’s Lemma 1 5.4 is also valid in this case. But we can say even more: As 
evidence that the Lebesgue integral is easy to work with in this regard, we next sketch a 
direct proof of the Riemann-Lebesgue Lemma (Exercise 18.55), Lebesgue’s variation 
on Riemann’s Lemma. 

Theorem 19.17. If f is Lebesgue integrable on [-n,n], then 

lim / f(x) cos nx dx = 0 = lim / f(x) slnnxdx. 

n-‘OG I, n— 00./ 


proof. First consider the case / = where -n < a < b < n. Clearly, 


f(x) cosnx dx = / cos nxdx = (sinfc — sina)/« — * 0 


as n oo. 

Now, given e > 0, Theorem 18.27 (iii) supplies a step function h, vanishing 
outside ( — 7 r, jr ], such that 1/ - h\ < e. But h is just a linear combination of 
functions of the form xia,z>i- Thus, from the first part of the proof (and the linearity 
of the integral) we may choose n sufficiently large so that \f* x h(x) coshjc</x| < e. 
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The triangle inequality does the rest: 

‘ 7 T j /» Tt p7T 

f(x) cos nx dx <\l h(x) cos nx dx -f / (f(x) — h(x))cosnx dx 

-n I J —tt ^ J —7 r 

< e + / \f — h\ <2e. 

In other words, f** f(x) cos nxdx — ► 0 as n -> oo. The proof with sinnx in 

place of cos nx is essentially identical. □ 

Clearly, Observations 15.1 (e) and 15.1 (f ) are unaffected by our choice of integral, 
so we next revise Observation 15.1 (g). The proof of the following proposition is 
essentially identical to that given in Observation 15.1 (g) but, since it is an extremely 
important result, the details bear repeating. It is sometimes referred to as the Riesz- 
Fischer theorem. 

Proposition 19.18. If f e L 2 [-tt, tt], then ||s„(/) - fh -> 0. 

proof. Let e > 0, and choose a function g e C 271 satisfying \\f — gh < £ (see 

Exercise 60). Next, since is linear, we have 

11/ - S n (f)h < 11/ ~ 8 II 2 + II g - S n (g)\\ 2 + | M/ ~ g)h- 

From Bessel’s inequality, we have ||s„(/ - g )||2 < 11/ — gib < £ and so, from 

Observation 15.1 (e), we get 

11/ — s„(f )lb <2e + ||g — s«(g)|| 2 < 3s 

for all n sufficiently large. □ 

As an immediate consequence of Proposition 19.18, notice that if f e L 2 [- 7 t, tt], 
then s n (f ) converges in measure to / on [— 7r, n] (see Exercise 30). Thus, although 
the pointwise convergence of Fourier series is a thorny problem in general, every 
/ e L 2 [— tt, 7 r] has a Fourier series that converges in at least this “general” sense. 
Moreover, Riesz’s Theorem 19.4 now supplies a subsequence of (s n (f )) that converges 
pointwise a.e. to /. Better and better! Since L 2 [— n, n] contains the (bounded) Riemann 
integrable functions on [—tt, tt], we have arrived at a simple, general convergence result 
for the Fourier series of a large class of functions. 

Returning to our “surgery” on Observation 15.1, notice that ParsevaVs equation 
(15.5) follows easily from Proposition 19.18. Specifically, if / e L 2 [— tt, 7r], then, 
from equation (19.8) and Proposition 19.18 we have ||/||| = lim^oo ll^«(/ )ll|* In other 
words, 

t rn 2 OO 

- \f(x)\ 2 dx = + J2i a k+ h k) (19.10) 

* J -’ r 2 k^l 

for / g L 2 [— 7t, n]. It now follows, as in Observation 15.1 (i), that distinct elements of 
L 2 [— 7t, tt] have distinct Fourier series. That is, if /, g e L 2 [-7t, jt] satisfy f* n [ f(x ) - 
g(x)] cos nx dx — 0 and [/( jc) — g(x)j sin nx dx = 0 for all n = 0, 1 , 2, ... , then 

[/(^) - 5(^)] 2 dx = 0 and, hence, f - g = 0 a.e. 
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Our next result is, in a sense, a converse to Bessel’s inequality. It is also sometimes 
referred to as the Riesz-Fischer theorem. 

Proposition 19.19. If and (b„)fL l are real sequences with ( a * + 

b\) < oo, then there is an f e Lii-n. n] satisfying 

s„if )(*) = y + ( a * cosfa + ** S' 0 *- 1 )' 

L t=l 


/or all n, and 


i f i/mi’,,, = f + f>,’+«. 


proof. Let 7 ^(jc) = («o/ 2 ) 4- (o* cos fcx + b* sin A:jc) and notice that for 

0 < m < n we have 


II T'/. — T’mlli = ^2 [a k cos kx + b k sin kx) 

(| fc=m-f 1 || 2 

n 

= ^ (al + bl) 0 as w, h — ► oo. 

k=m+\ 


Thus, (r„) is a Cauchy sequence in Z,2[-7r, tt ] and, as such, converges to some 
/ e L2[-7t, 7r] by Theorem 19.10. Now notice that if k < n, we have 

If" Ilf" 

a k / f{x) cos kxdx = — / (T„(x) - f(x)) cos kxdx 

tt J -n \tt J-jf 


<-[ \T n (x) - f(x)\dx 
71 J 

< y/2 ||T„ — /lb -*• 0 as n ->• 00 . 


Thus, a k = ( 1 /tt) f* n fix) coskx dx. Similarly, b k = 2. /"^ y (jr) sin kx dx. Since 
/ € *]. Oie rest is easy. □ 


We can easily collect several of our observations into a single “abstract” formula- 
tion: The map that sends an / € Lii-n, tt] into its sequence of Fourier coefficients 
(oo. a \ , b \ , 02, 62 , . . .) is a linear isometry from Lii-n. tt] onto £2 ! Indeed, since the 
map is clearly linear, Parseval’s equation (19.10) tells us that the map is an isometry 
into £2, while Proposition 19.19 tells us that the map is, in fact, onto £2. 

This observation, which is itself sometimes called the Riesz-Fischer theorem, is a 
seminal result in functional analysis. It says that everything we need to know about the 
“big” space of functions Li[—n, n ] could be gleaned from the “little” space of sequences 
£2- In particular, the proof that L 2 [-n, ;r] is complete, which would appear to require 
several measure-theoretic tools, could apparently be deduced from the elementary fact 
that £2 is complete. Likewise, Proposition 19.18 (the fact that the trigonometric system 
is complete in Z,2 [— jt, it] in the classical sense) should follow from the analogous (and 
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much simpler) result that each element of x e is the norm limit of the sequence of 
truncated elements (jq, . . . , 0, 0, . . .). Amazing! 

The circle of ideas represented by the various Riesz-Fischer theorems constitutes 
one of the earliest examples of a functional analytic argument: In this case, a “soft” 
fact about isometries between abstract spaces yields “hard” information about Fourier 
series. 


EXERCISES 

> 65. Suppose that we renormalize L p [— 7r, tt] by setting 

\\f\\p = (j t f \m\ pd ^) 

for 1 < p < 00 (but leave ||/||oo as in equation (19.5)). Check that Holder’s in- 
equality and Minkowski’s inequality remain true in this new setting. The renormalized 
space L p [— 7r, tt] is obviously still complete. Why? 

> 66. With the L^-norms defined as in Exercise 65, check that \\f\\ p < \\f\\ q for any 
1 5 P < q < 00 and any / e L q [—n , n]. 

67. With the L^-norms defined as in Exercise 65, prove that we still have 
lim^oo \\f\\p = || /|| 00 for / e Lool—n, tt]. (In other words, there is no need 
to scale the Loo[— 7T, 7r]-norm.) 

0 


Notes and Remarks 

Much of the material in this chapter is due to the great Hungarian mathematician Frigyes 
(Frederic, Friedrich) Riesz. Riesz introduced convergence in measure in Riesz [1909a], 
wherein he proved that a sequence converging in measure has a subsequence converging 
a.e. (Theorem 19.4 and Corollary 19.5). The fact that convergence a.e. over a set of finite 
measure implied convergence in measure (Theorem 19.3) had already been pointed out 
by Lebesgue [1906]. As an application, Riesz points out that the Fourier series of a 
Lebesgue square-integrable function must converge in this “general” sense (combine 
the result of Exercise 30 with Proposition 19.18). In Riesz [1910a], Riesz points out 
that Fatou’s Lemma and Lebesgue’s Dominated Convergence Theorem are valid for 
convergence in measure (see Exercises 21 and 22). 

Frechet [1921] first proved that convergence in measure could be described by a 
metric, namely, d(f, g ) = inf{£ 4- m[\f — g\ > e] : e > 0}. Another metric (for con- 
vergence in probability) is discussed in Dudley [1989; §9.2]. The counterexamples 
discussed in Exercises 19 and 20 were pointed out to me by D. J. H. Garling and S. J. 
Dilworth. 

Theorem 19. 17 is often called Mercer' s theorem (see also Exercises 1 8.55 and 18.56, 
and the notes to Chapter Eighteen). For a discussion of the contributions of Riemann 
and Lebesgue, see Hawkins [1970]. 
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In 1908, Erhard Schmidt (this is the Schmidt of the “Gram-Schmidt process”) in- 
troduced what he called “function spaces” (Schmidt [1908]). In modem terminology, 
Schmidt developed the general theory of the space that we would call I 2 , the collection 
of all sequences (Zj) of complex numbers satisfying U;l 2 < 00 and endowed 
with the inner product (z, w) = Zj Wj. Schmidt further introduced (possibly 
for the first time) the double bar notation ||z|| to denote the norm of z, defined by 
||z|| 2 = (z, z) = J2JL ! Zj Zj = Xiyli \Zj\ 2 - He deduced Bessel's inequality in this gener- 
alized setting, went on to consider various types of convergence, and defined the notion 
of a closed subspace. Schmidt’s most important contribution from this work is what we 
today call the projection theorem. 

Schmidt [1908] and Frdchet [1907, 1908] remarked that the space Lj[ a, b ] supported 
a geometry that was completely analogous to Schmidt's space of square-summable 
sequences. 

Meanwhile, in a series of papers from 1907, Riesz [1907a, 1907b, 1908, 1910b] 
investigated the collection of (Lebesgue) square-integrable functions, a space that Riesz 
would later refer to as Lz (Riesz [1910b]). Riesz was motivated in this by Hilbert’s work 
on integral equations, and also by the recent introduction of the Lebesgue integral, 
an important paper of Pierre Fatou that applied the new integral (Fatou [1906]), and 
Frechet’s work on abstract spaces (Fr6chet [1906, 1907]). The main result in Riesz 
[1907a] states that there is a one-to-one correspondence between Schmidt’s space £2 
and the space Lz (by means of an intermediary orthonormal sequence). 

The spaces L p for 1 < p < 00 were introduced in Riesz [1910b]. In fact, the integral 
versions of Holder’s inequality (Lemma 19.7) and Minkowski’s inequality (Theorem 
19.9) are due to Riesz. The result in Exercise 39 was first proved by Radon [1913] 
and, independently, by Riesz [1928a, 1928c] (it is sometimes called the Radon-Riesz 
theorem)', see also Novinger [1992]. To better understand the embedding of C[a,b ] 
into L p [a,b], as in Exercises 37, 46, 56 and 60, and Corollary 19.15, see the note by 
Zaanen [1986]. 

Independently, and at nearly the same time as Riesz, Ernst Fischer [1907a, 1907b] 
considered the notion of convergence in mean for square-summable functions, that 
is, convergence in L 2 -norm. Fischer’s most important result, in modem language, is 
the fact that Lz is complete with respect to convergence in mean. From this, Fischer 
deduced Riesz’s result, above, and the combined result is usually referred to as the 
Riesz-Fischer theorem. Today this result is viewed as a remarkable discovery, but at 
the time it was considered a mere technical observation in a very specialized area. 

The “/^-theory” was originally introduced using the Lebesgue integral, and was 
offered as an early application of the power of Lebesgue’s new theory. The Riesz- 
Fischer theorem stands out as an important early contribution to both harmonic and 
functional analysis. It would ultimately lead to the modem theory of Hilbert spaces, that 
is, complete normed spaces in which the norm is induced by an inner product, such as iz 
(see Lemma 3.3 and the remarks above) and Lz (see Observation 15.1 (c)). For a more 
thorough history of the development of function spaces, the Riesz-Fischer theorem, 
and the early history of functional analysis, see Bemkopf [1966, 1967], Dieudonnl 
[1981], Dudley [1989], Dunford and Schwartz [1958], Hawkins [1970], Kline [1972], 
Monna [1973], NikolSkij [1992], and Taylor [1982]. 
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Although Riesz’s observation that a subsequence of ($„(/ )) converges pointwise 
a.e. to / € Z -2 is quite general, it would be more satisfying to know that the sequence 
(s„(/ )) itself converged pointwise a.e. to /. Since it is a natural question, Luzin was 
led to pose this as a problem in 1915. It would go unsolved for over 50 years. That 
it is, indeed, true that each / € Li[— n, n] is the a.e. limit of its Fourier series is a 
very deep modem result due to Lennart Carleson [1966]. Carieson’s theorem marked 
the end of a centuries-long search for a general convergence result on Fourier series. 
Carieson’s theorem was later generalized to L p [— n, n], \ < p < oo, by Hunt [1971]. 
See Mozzochi [1971], and also Goffman and Waterman [1970] and Halmos [1978]. 
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Lebesgue’s Differentiation Theorem 

In the last several chapters, we have raised questions about differentiation and about 
the Fundamental Theorem of Calculus that have yet to be answered. For example: 

• For which / does the formula /* /' = f(b) — f(a) hold? If /' is to be integrable, 
then at the very least we will need /' to exist almost everywhere in [ a, b ]. But this 
alone is not enough: Recall that the Cantor function / : [ 0, 1 ] -*■ [ 0, 1 ] satisfies 
f = 0 a.e., but / 0 ‘ f = 0 # 1 = /(l) - /(0). 

• Stated in slightly different terms: If g is integrable, is the function f(x) = f*g 
differentiable?And,ifso,is /' = gin this case? For which / is it true that f(x) = f* g 
for some integrable g ? 

In our initial discussion of the Stieltjes integral, we briefly considered the problem 
of finding the density of a thin metal rod with a known distribution of mass. That is, 
we were handed an increasing function F(x) that gave the mass of that portion of the 
rod lying on [a, x ], and we asked for its density f(x) = F'(x). We side-stepped this 
question entirely at the time, defining a new integral in the process, but perhaps it merits 
posing again. 

• Given a increasing, is a differentiable at enough points so as to have f* f da = 
f* f(x)a'(x)dx hold for, say, all continuous / ? That is, is every Riemann-Stieltjes 
integral a Lebesgue integral? Or even a Riemann integral? 

• In particular, if / is of bounded variation, does f exist? Is f integrable? If so, is 
it the case that V* / = /*!/'!? This would give the analogue, in one dimension, of 
the integral formula for arc length. 

• A certain special case is worth considering on its own: Early on in our discussion of 
Lebesgue measure, we encountered the function f{x) = m (E n (— oo, x ]), where E 
is a measurable set of finite measure. We might also write f(x) = xe< which 
makes it all the easier to see that / is continuous. The function / represents the 
distribution of mass of an object whose density is xe ■ The question in this case is 
whether / is differentiable and, if so, whether f = xe- 

In this chapter, thanks to the genius of Lebesgue, we will finally supply answers to 
several of these questions. Here is the key result: 
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Lebesgue’s Differentiation Theorem 20.1. Iff : f a,b]-+ R is monotone, then 
f has a finite derivative at almost every point in [ a, b ]. 

That’s the good news The bad news may come as a surprise to you: Differen- 

tiation is hard! It’s nothing that we can’t handle, mind you, but it is technically more 
demanding than integration. The reason for this is nothing new; we have already seen 
that derivatives are harder to come by than integrals. It’s easy to see, for example, that 
every continuous function on f 0, 1 ] is Riemann integrable while, as we now know, the 
“typical” continuous function fails to have a finite derivative at even a single point. 
(Recall our discussion at the end of Chapter Eleven.) But, the news isn’t all bad: There 
are only a few hard technical details to sort through. The rest is smooth sailing. 

Now, since we want to discuss functions that may not be differentiable in the strict 
sense, it will help matters if we introduce a “loose” notion of the derivative. An easy 
choice here is to consider the derived numbers of a function. Given a function / : R -> 
R, an extended real number X is called a derived number for / at the point x 0 if there 
exists a sequence h n -> 0 ( h n ^ 0) such that 

f(x 0 + h n )~ /(x 0 ) 

lim = A.. 

«-► oo h n 

In other words, X is a derived number for / at xo if some sequence of difference 
quotients for / at x 0 converges to X (where we include X = ±oo as possibilities). We 
will abbreviate this lengthy statement using the terse shorthand 

X = D/(x o), 

with the understanding that D/(x 0 ) denotes just one of possibly many different derived 
numbers for / at x 0 . [In other words, Df is not a function.] 

Since we permit infinite derived numbers, it is clear that derived numbers exist 
at every point xo- (Why?) Of course, if the derivative /'(x o) exists (whether finite or 
infinite), then f'(x 0 ) is a derived number for / at x 0 . In fact, in this case, /'(x 0 ) is the 
only possible derived number for / at x 0 . (Why?) 

As an example, consider the function /(x) = xsin(l/x), x ^ 0, /(0) = 0, at the 
point x 0 = 0. If we set h~ x = {An - 3)7t/ 2, then 

fix o + h n ) - f{x 0 ) h n sin (h~ l ) . {An - 3 )tt 

1 = 1 = sin o = 1 

h n h n 2 

for all n = 1,2, Thus, X = 1 is a derived number for / at 0. It is not hard to see 

that every number in [—1, 1 ] is a derived number for / at 0. 


EXERCISES 

1. Compute the derived numbers for f ~ X q- 

2. Consider f{x) — x sin(l/x), x ^ 0, /( 0) = 0, at the point x 0 = 0. Show that 
every number in [— 1 , 1 ] is a derived number for / at 0. 

t> 3. Let / : [ a, b ] -> R. Show that derived numbers for / exist at every point xo 
in [a,b ]. [Hint: See, for example, Exercise 1.26.] 
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> 4. If / : [ a, b ] E is increasing, show that all of the derived numbers for / are 
nonnegative (i.e., in [ 0, oo ]). 

t> 5. Let / : E E and let Xq € E. Prove that f'(xo) exists (as a finite real number) 
if and only if all of the derived numbers for / at jco are equal (and finite). Is this still 
true when f'(x o) = ±oo? 

6. Let /, g : E — > E, let Xq € E, and suppose that g'(*o) exists as a finite real 
number. Show that X is a derived number for / at *o if and only if X + g'Oto) is a 
derived number for / + g at jco. 

7. If / : (a, b) E is differentiable, show that /' is Borel measurable. If / is 
only differentiable a.e., show that /' is still Lebesgue measurable. 

8. If f\x ) exists and satisfies \f'(x)\ < K for all x in [a,b] y prove that 
m*(f(E)) < Km*(E) for any E C [a, b]. 


With the notion of derived numbers (and Exercise 5) at our disposal, we can now 
describe our plan of attack on Lebesgue’s theorem. To say that a function / has a finite 
derivative almost everywhere is the same as saying that the set of points x 0 at which 
/ has two different derived numbers, say D\f(x o) < D 2 f(x o), has measure zero. To 
address this, we will use a bit of standard trickery and consider instead those derived 
numbers that satisfy D\f(x 0 ) < p < q < D 2 /(x 0 ), where p < q are real numbers. 
Thus, we would like to know something about the measure of the set of points at which 
either Df(x) < p or Df(x) > q occurs. 

Now Lebesgue’s theorem concerns a monotone function /, but it should be clear that 
we need only consider the case where / is increasing. In fact, we will first consider the 
case where / is strictly increasing; the general case will follow easily from this. Finally, 
we can circumvent occasional concerns about the domain of / simply by assuming that 
every function / : [a, b] -> E has been extended to all of E by setting f(x) — f(a ) 
for x < a and f(x) = f(b) for x > b. 


Lemma 20.2. Let f : [a, b] E be strictly increasing , let E c [a, b], and let 

0 < p < oo. If for every x e E, there exists at least one derived number for f 
satisfying Df{x) < p, then m*(/(£)) < pm*(E). 


proof. Let e > 0, and choose a bounded open set G D E such that m (G) < 
m*{E) + £. For each xo e £, choose a null sequence ( h n ), with h n ^ 0 for all «, 
such that 


lim 

n— ► oo 


/(*o + h„) - /Oo) 


= Df Oo) < P- 


Now consider the intervals 


d„ Oo) = 


Oo,*o + h„], 
. Oo + h *o], 


if h„ > 0, 
if h„ < 0, 


(20.1a) 


A„Oo) 


I r /Oo), /Oo + h n )), 
U/Oo + ^«), / Oo)], 


and 


if h„ > 0, 
if h„ < 0. 


(20.1b) 
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The intervals {</„(jco) : x 0 e E,n > 1) cover E while the intervals {A„(jc 0 ) : 
xo e E,n > 1} cover /(£). Notice that since / is strictly increasing, we have 
m(A n (jto)) > 0 for any *o, n. 

Since h„ -*■ 0, we may suppose that d„(xo) C G for all n. We may also suppose 
that 


for all n. Since 


/(*0 + />n)- /(-to) 

h n 


< p + e 


( 20 . 2 ) 


m(d„ix 0 )) = \h„\ and m(A n (Ar 0 >) = l/Uo + h„) ~ /(*o)l. 
equation (20.2) can be written as 

m(A n (jr 0 )) < (p + e)m(d„(x 0 )) 

for all n. In particular, we must have m(A„(jco)) -* 0 as h n -*■ 0. Thus, the 
intervals (A„(jr 0 ) : x 0 € E, n > 1 } actually form a Vitali cover for /(£). 

By Theorem 16.27, we can find countably many pairwise disjoint intervals 
{A„, (*, )} such that 

«*f/(£)\U =0 - 

Thus, 

OO 00 

m*(f(E)) < ^m(A n ,(j: I )) < ip + e)^m(d n , (x,)). (20.3) 

i=i i=i 

But the intervals {d n( (jc ( )} must also be pairwise disjoint. (Why?) Hence, 

= ”( 

Combining equations (20.3) and (20.4) yields 

m*(/(£)) < ip + e)m (G) < ip + e)(m*(£) + £). 

Letting e -*■ 0, we get m*(/(£)) < pm*iE). □ 


2 Jm(d ni iXij) 




miG). 


(20.4) 


A similar, but slightly more complicated line of reasoning applies to the set of points 
where Dfix ) > q. 


Lemma 20.3. Let f :[ a , b ]—> R be strictly increasing, let E C [a, b], and let 
0 < q < oo. If, for every x € E, there exists at least one derived number for f 
satisfying Dfix) > q, then m*(/(£)) > q m*iE). 


proof. Let e > 0. Since /(£) is bounded, we may choose a bounded open set 
G D /(£) such that miG) < m*(fiE)) + e . For each jco € £, choose a null 
sequence (/>„) such that 


lim 

n-+oc 


fix o + /»n)- fjxp) 
h „ 


= Dfix 0 ) > q. 
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As before, we may suppose that 

fix o + h„) - f(x o) 

7 >q-e 

"ft 

for all n. Thus, if we define the intervals d n (x 0 ) and A„(xo) exactly as in equa- 
tion (20.1), then we have 

m( A„(x 0 )) > iq - e)m(d„(x 0 )) 

for all n and all x 0 € E. 

We would like to argue, as before, that by reducing to countably many intervals 
we can compare the measures of E and / (£), by way of the open set G. In this 
case, we want to know when A„(xo) is contained in G. But, if xo e £ is a point of 
continuity of /, then A n (xo) will be completely contained in G for all n sufficiently 
large. (Why?) This works at nearly every point x 0 € £: If we let S denote the set 
of points in £ at which / is continuous, then, since / is monotone, the set £ \ S is 
at most countable. In summary, we will suppose that A„(xo) C G actually occurs 
for all n and all x 0 e S. Now we are ready for Vitali! 

The intervals {d n (x 0 ) : xo e S.n > 1 1 obviously form a Vitali cover for S. 
Thus, there are countably many pairwise disjoint intervals {</„,(*, )} such that 


j* ^S\0rf n ,(x,)j =0. 


Hence, 


m*iS) < ^m(d„,(x ( )) < — — ^m(A ni (x ( )). (20.5) 

i=i ^ £ i=l 

Now, since / is strictly increasing, the intervals {A n ,(jt,)} must also be pairwise 
disjoint. Consequently, 


PC / OO \ 

^m(A n ,(x ; )) = m I |J A„,(x/) I < m (C). 

.=i \/=i / 


( 20 . 6 ) 


Combining our observations in light of equations (20.5) and (20.6) yields 


m*(£) = m*(S) < [m*(/(£)) + el. 

q - e L 

Letting e -»■ 0, we get m*(/(£)) > qm*{E). □ 

The hard work is (almost) over! Now we sit back and collect the benefits: 

Corollary 20.4. If f : ( a, b ] -*■ R is increasing, then the set of points at which 
at least one derived number for f is infinite has measure zero. 


proof. This is nearly obvious if / is strictly increasing. In this case. Lemma 
20.3 tells us that if the set £ = {x : D/(x) = +oo) has nonzero measure, then the 
set /(£) would have infinite measure. (Why?) This is clearly impossible since 
f(E)Clfia),fib)]. 
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If / is not strictly increasing, we consider instead the function g(x) = /(*) -f x. 
Since g is strictly increasing and satisfies 

g(x + h) - gjx) _ f{x 4 - ft) - /(*) 
h ~ h 

it is clear that [x : Df(x ) = + 00 } = {* : Dg(jt) = + 00 }. The latter set has 
measure zero. □ 

Corollary 20.5. Let f : [a, b] R be increasing and let 0 < p < q < 00 . If 
at every point x in some set E pq c [a, b ] there exist two derived numbers for f 
satisfying D\f(x) < p < q < D 2 f(x), then m (E pq ) = 0. 

proof. If / is strictly increasing, then Lemmas 20.2 and 20.3 imply that 
qm*{E p , q ) < m*(f(E Piq )) < pm*{E p<q ), 
and hence that m ( E Pjq ) = 0. 

When / is not strictly increasing, we simply apply the first part of the proof 
to the function g(jt) = f(x) + x , replacing p by p + 1 and q by q + 1. □ 

Finally we are ready for the proof of Lebesgue’s theorem. 

Theorem 20.6. If f : [ a, b ] R is increasing , then f has a finite derivative at 

almost every point in [a, b], 

proof. Let E denote the set of points x e [a, b 1 at which f\x) does not exist. 
Now a bit of shorthand makes the rest of the proof easy: Let’s agree to write 
{x : D\ f{x) < D 2 f(x)} to denote the set of points x at which / has two different 
derived numbers D\f(x) < D 2 f(x). Then, 

E = {x : D,/( x) < D 2 f(x)} = (J [x : D,/(x) <p<q< D 2 f(x)}, 

P«l 

p,q€® 

where E p>q = [x : D\f(x) < p < q < D 2 f(x)} denotes the set of points x at 
which /has two different derived numbers satisfy ingDi/(x) < p < q < D 2 f(x). 
From Corollary 20.5, each E Piq has measure zero. There are at most countably 
many such sets for p, q e Q and hence m (E) = 0; that is, f\x) exists at almost 
every point in [a, b]. From Corollary 20.4, we know that the set of points at 
which f'(x) — -foo has measure zero; thus, /'( x) exists as a finite real number 
almost everywhere. □ 

Corollary 20.7. If f e BV[a,b ], then f has a finite derivative at almost every 
point in [a, b]. 


EXERCISES 

> 9. Consider the Cantor function / on [0, 1 ]. We know that f\x) — 0 when 
x € [0, 1 ] \ A, the complement of the Cantor set. Compute /'(jc), if possible, when 
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x € A. [Hint: If x is an endpoint, show that /'(*) does not exist; otherwise, show 
that f'(x) = +oo.] 

10. Prove or disprove: If every derived number for / on [a y b] is nonnegative 
(or +oo), then / is increasing. 

11. If m(E) = 0, prove that there is a continuous, increasing function / : R -> R 

such that f\x) = +oo at each point x € E. [Hint: Let (U n ) be a decreasing sequence 
of open sets containing E with < 2~ n . Now let f„(x) = m((— oo, jc) H U„) 

and let / = f»-\ 


Now that we have expanded our collection of differentiable functions, the next item 
on the agenda is the Fundamental Theorem of Calculus. To address this and other 
questions raised at the beginning of this chapter, we will first need to discuss the 
measurability and integrability of derivatives. 

Theorem 20.8 

(i) If f is increasing on[a,b ], then f is measurable, /' > 0 a.e., and /* /' < 
f(b) - f{a). 

(ii) Iff eBVla,b], then /' € L,[a. b] and f b \f'\ < V a b f. 


proof. Recall our assumption that any function / on [ a, b ] has been extended 
to all of R by setting f(x) = f(a) for x < a and f(x) = f(b) for* > b. 

The proof of (i) is easier than you might imagine: An increasing function / is 
measurable, and 


'(*) = lim_ n (f ^ - f(x)j 


f 


for almost every * in [a,b\. Hence, /' is measurable and /' > 0 a.e. (Why?) 
Next we use Fatou’s Lemma to estimate f* f: 


r b 

f b ( 

/ i \ \ 

1 f' = 

1 lim n ( , 

f(x + - -/(*)) 

Ja Ja n ~*°° \ 

V «/ / 


< liminf n Qf / ^ dx - jf fWdx'j 


= liminf nil 

\Ja 

= liminf n [ / 
\Jb 

< m - f(a). 


b+{\/n) 

a-fU/n) 

b+O/n) 


f 


-f» 


fa+(\/n) \ 

I. f ) 


t 


since / is increasing and since /(*) = f(b) for x > b. Please note that the 
“change of variable” is easily justified here; indeed, since / is monotone, each of 
the integrals above is actually a Riemann integral. 

Now suppose that / is of bounded variation on [a, b ], and recall that we may 
write / = u — (v — /), where v(x) — V/ /, and where v and v - f are both 
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increasing. Of course, then f' = v'-(v- /)' exists a.e. and is measurable. But, 
by recalling a basic inequality, we really get something more: For x < y we have 

I fly) - fix ) | < V// = v(y) - v(x), 

and it follows that |/'| < v' a.e. So, from the first part of the proof, /' is integrable 
and 

[ b l/'l < tv’ < v(b) - via) = V a b f. □ 

Ja Ja 

We have made some progress on one of our questions: We now know that if / is 
of bounded variation, then /' exists a.e. and f is integrable. This still is not enough 
to make the formula fib) - fia) = f* f hold (recall the Cantor function). But is 
it necessary to have / € BV[a,b] in order that the formula hold? The answer is: 
Yes, and then some. To see this, we will turn the question around: If we set fix) = f* g, 
where g is integrable, is / of bounded variation? If so, is /' = g a.e. (in which case, 
fix) = f* f )? The answers are supplied by our next result. 

Theorem 20.9. Let g be integrable on[a,b ], and let fix) = f* g. Then: 

(i) / e C[a,b]C\ BV[a,b] and f* \f'\ <V* f < f* |g|. 

(ii) / = 0 if and only if g = 0 a.e. 

(iii) f = g cue.; hence, fix) = f* f and Vff = f* \f'\. 


proof, (i) is very easy. We have already seen that indefinite integrals are continu- 
ous (see Corollary 18.21). That / is of bounded variation is surprisingly easy, 
too. Notice that 

m= f x g= f V - f g- 

Ja Ja Ja 

and both f* g + and f* g~ are increasing. Hence, by the triangle inequality for 
variations, 


v a x f< r g + + f g -= r 

Ja Ja Ja 


1 * 1 . 


That f* |/'| < Vf f is a consequence of Theorem 20.8 (ii). 

Next, (ii) follows from considering f*gasa measure (see Corollary 1 8.26). 
If / = 0, then 



= 0 for all x 



for all ic,d) c [a,b] 
for all open sets U C [ a, b ] 
for all Gj-sets G C [ a, b ] 


for all measurable E c [a, b\. 
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since every measurable set is, up to a null set, a G<r set. Consequently, g = 0 a.e. 
Since g = 0 a.e. always forces / == 0, this proves (ii). 

Finally, we’re ready for the proof of (iii). By considering g + and g~ separately, 
we may suppose that g > 0. Of course, this will make / increasing, and hence 
/' > 0 a.e. 

Now, let’s simplify things further by assuming that g is also bounded, say, 
0 < g < K. In this case, 

and n(f(x + (1 /n)) — f(x)) -+ f(x) a.e. So, by the Dominated Convergence 
Theorem, 

[ px+(\ /n) ra+{\/n) “| 

n l { -L f \ 

= f{x) — f(a), because / is continuous, 



And now, /' = f* g, for all x , implies that /' = g a.e., from (ii). 

In the general case (where g is integrable and nonnegative but not necessarily 
bounded), we truncate g by defining g n ( x) = g(x) if g(x) < n and g n (x ) = 0 other- 
wise; that is, g n = g ■ X[g<n }. Note that g„ -> g a.e. 

Now set f n {x) = /* g„. Since 0 < g„ < g, we have that / = (/-/„) + /„, 
and each of / — /„ and /„ is evidently increasing. But g„ is bounded: 0 < g„ < n; 
thus, by the case just proved, = g n a.e. Hence, 

/' = (/- /«)' + fn > fn= 8n~+ g a.e. 

It follows that /' > g a.e., and this turns out to be enough. Since / is increasing, 
we get 

fix) = fix ) - fia) > f /' > r g = /(x). 

7a 7a 

Hence, f = g a.e. □ 

Corollary 20.10. Ler E be a measurable subset of R with finite measure , arc d 
consider the “distribution” function f(x) — m[E n(—oo, Jt]). Then, for almost 
every x in] R, Z/ze “density” f f (x) exists and satisfies f' = xe0-£- That is, f'(x) = 1 
/or o.e. jc € £ ora/ /'(jc) = 0 /or o.e. x e £ c . 

proof. As we have already noted, /(*) = /^ xe- Thus, since xe is integrable, 
we have 

1 f x+h 

fix) = lim- / xe = Xe(*) for a.e. a:. □ 

o /z / r 
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Corollary 20.11. (Lebesgue’s Density Theorem) Let E be a measurable set , 
and define the metric density of E at a point x eR by 

De(x) = lim ~~ m(E fl [x — h, x + h ]), 
h-»o2h v 7 

provided that this limit exists. Then , De(x ) = 1 for a.e. x e E and D E (x ) = 0 for 
a.e. x e E c . That is, D E = Xe o,.e. 

proof. If m (E) < oo, the conclusion follows immediately from Corollary 20. 10. 
Indeed, in this case, we need only notice that 

1 f x+h 

D e (x) = lim — / xe, 

h -> o 2h J x _h 

and that this “two-sided” derivative exists and equals xe a.e. (Why?) 

Now the limit in question is a local property of E : For a given x, the existence 
of D e (x) depends only on the set E Pi [x — 1, x -h 1 ], for example, which is a 
set of finite measure. To arrive at a single exceptional set that does not depend 
on x, where the limit may fail to exist, consider the sets E n = E D [-n, n ] for 
n = 1,2, — We may conclude that the limit exists and equals xe„ for almost 
every x in [— n,n]. By discarding only countably many such exceptional sets, 
each of measure zero, one for each E n , we would then have that D E (x ) exists and 
equals xe a.e. □ 

We extend this result further by considering locally integrable functions, thus taking 
advantage of the fact that differentiation is a local property. A measurable function 
/ : R -> R is said to be locally integrable if f^ \ f\ < oo for every bounded interval 
[a,b]. 

Corollary 20.12. Let f be locally integrable. Then, for a.e. .* e R 

1 f x+h 

lim - / f(t)dt = f(x). 
h->o h J x 

In fact, 

lim \ f X+h \f(t)-f(x)\dt=0 
h-+0 h J x 

for a.e. x e R. 

proof. As before, by considering fx[-n,n ] for each n, we might as well suppose 
that / is integrable and vanishes off some bounded interval. In this case, the first 
conclusion is an immediate consequence of Theorem 20.9. 

The second conclusion takes a bit more work. For each rational r, the first part 
of the theorem supplies a null set N r such that 

1 f x + h 

lim - / |/(f) -r\dt = |/(x) - r\ (20.7) 

0 h J x 
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for all jc i N r . Thus, equation (20.7) holds for all r and all x £ N, where N = 
U r€ Q N r is still a null set. 

Now we can make the right-hand side of equation (20.7) arbitrarily small by 
letting r -> /(*), and so we must have 

i™ 1 f 1/(0 ~ /(*) \ dt = 0 

•>-* 0 h J x 

for all x £ N, that is, for a.e. x. □ 

Let’s summarize our progress. Assuming that /' is integrable, then, in order for the 
formula /( x) - f(a) = f* /' to hold, it is necessary to have / e C[a y b] n BV[a, b]. 
In fact, if / is the indefinite integral of any g e L\[a,b ], then we will have to have at 
least / e C[a , b] n BV[a, b]. But, as the Cantor function shows, still more is needed 
for sufficiency. The missing ingredient is the stronger form of continuity that is typical 
of the “measure” f* g. Before we formalize this notion, let’s take another look at the 
Cantor function. 

Example 20.13 

The Cantor function / : [0,1] [0,1] cannot be written as the indefinite 

integral of any g e Li[0, 1 ]. 

proof. Recall that A = where /„ is the “nth level Cantor set.” In 

particular, the /„ are nested, closed sets satisfying m (/„) 0 as n -> oo. More 

specifically, I n is the union of 2" disjoint, closed intervals, each having length 
3~\ say, I n = (jfli [ x n,t » y n ,i ]» where the x n j and y n j are “endpoints” of A. Since 
/„ D A, the Cantor function / maps each I n onto all of [0, 1 ]. 

Now suppose that f(x) == g for some g e L i [ 0, 1 ]. Then, 

2 ” 

i = /( i) - m = Lt/ow) - /(*».«)] = 

i=i 

But since m(I n ) — > 0, we should also have / 7 g -> 0. Since we are denied this 
possibility, no such g can exist. □ 

The problem, in brief, is that m(/( A)) = 1 while m (A) = 0, and this (somehow) 
precludes the possibility of recovering / from /'. We will pursue this idea in detail in 
the next section. 


l. 


(Why?) 


EXERCISES 

12. Find examples of a measurable set E and a point x for which: De(x) = 1 /2; 
D e (x) = 1/3; D e (x ) does not exist. 

13. Fill in the missing details in the proof of Corollary 20.12. 
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Absolute Continuity 

Although we have enumerated various “big questions” several times already, one more 
incantation couldn’t hurt. 

Question. Given /, when may we write / as an “indefinite integral”? That is, 
when does the formula f(x) = C + f* g hold, for some constant C and some 
g€L,? 

Question. Given / with /' € L i, we may consider the function g(x) = f* f. 
We know that g' = f a.e., but does this mean that / = g + C for some constant 
C? For which / is this true? 

The answers to these questions turn out to involve a stronger form of continuity 
that is satisfied by “indefinite integrals” or “measures” (see Exercise 18.35 or Lemma 
20.14, below). 

We say that a function / : [ a, b ] -*■ R is absolutely continuous if, for every e > 0, 
there exists a 8 > 0 such that £ (> , |/(b,) - f{a,)\ < e whenever {(a,, £,)),> i is any 
sequence (finite or infinite) of disjoint subintervals of [a, b ] satisfying^ >,(!>, —a,) < 8. 

The requirement that the open intervals {(a ; , fc,)} be disjoint is sometimes stated 
by saying that the corresponding closed intervals {[a,, 6, ]} must be nonoverlap- 
ping, a self-explanatory nomenclature. However we choose to say it, notice that 
m (Uf>i[ fl i» b > l) < 5 is required. 

By way of a simple example, notice that every Lipschitz function is absolutely con- 
tinuous. It is also evident that every absolutely continuous function is (uniformly) 
continuous. (Why?) If we write AC[a,b ] to denote the collection of all absolu- 
tely continuous functions on [ a, b ], then, as sets. Lip 1 [a,b]c AC[a,b]<zC[a,b], 
Our goal in this section is to prove that a function / can be written as /(*) = C + f* g, 
where g € L\, precisely when / is absolutely continuous. To begin, we prove that an 
indefinite integral is absolutely continuous. 

Lemma 20.14. If g € L\, then fix) = f*g is absolutely continuous. In fact, 
given e > 0, there is a S > 0 such that f A |g| < e whenever m {A) < 8. 

proof. We begin with the proof of the second statement. Given e > 0, there is 
a bounded, integrable function h such that f \g - h\ < e/2. If 0 < K < <x> is 
chosen so that |A| < K, then 

f S € 

I \h\ < Km (A) < - whenever m (A) < — . 

Ja 2 2 a 

Thus, 

J \g\< j \g-h\ + J \h\<e 


whenever m (A) < e/(2K) = 8. 
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The first conclusion now follows easily from the second: Given nonoverlapp- 
ing intervals {[a;, b, ]}, notice that 

<£/\i = / | s | . 

i >1 J A 

where A = , [a, , A, ], and where the last equation holds by Corollary 1 8.26. □ 

The absolute continuity of f(x ) = f* g can be regarded as a condition on the measure 
H(A) = f A |g|, namely, n(A) < e whenever m (A) < S, or n(A) -*• 0 as m (A) -*■ 0. 
In this sense, absolute continuity is a continuity proper of (certain) measures. (See 
Exercise 18 for a related condition.) 

From Theorem 20.9, indefinite integrals are not only continuous, but also of bounded 
variation. In fact, the same can be said of any absolutely continuous function. 


£l/(M -/(«,)! = £ f 

;>i />i , ' a i 


Proposition 20.15. (i) iff eAC[a,b], then f e C[a,b)n BV[a,b]. (ii) / € 
AC[a,b] if and only if v(jt) = V// e AC[a,b). 

proof. We have already noted the inclusion AC[a,b] c C[a,b]. Thus, we 
first need to show that AC[a,b] C BV[a,b]. To this end, let / € AC[a,b ], 
and choose S > 0 to correspond to the choice e = 1 in the definition of absolute 
continuity. 

We first note that if [ c, d ] c [ a, b ] with d — c < S, then Vff < 1 . Indeed, 
no matter how we might partition [c,d] = Ui>|[ a <-*< 1 > nto nonoverlapping 
intervals, we always have ^ ;>,(&/- «,•) — d-c < 5, and hence t = 1 is always an 
upper bound for 5^, >i l/(^/)- /(a,)|.Thus, ifwe now partition [a, fc] into A/ = 1 + 
\ib-a)j&\ subintervals {( c, , d, ]} ( ^ , , each of length less than <5, then we would have 

N 

V a b f <Y, v c!f<N. 

1=1 

This proves (i). 

But our proof of (i) actually shows much more: If / is absolutely continuous, 
and if {[a,, b, ]},->i is any sequence of nonoverlapping intervals with 52i>i(^ — 
a /) < S, where S > 0 corresponds to a given e > 0 in the definition of absolute 
continuity for /, then we must have 52, > i V a! f - e - Indeed, even if each [ o, , 6, ] 
is further partitioned, the collection of new, smaller subintervals would still have 
total measure less than S. Thus, v(x) = f e AC[a,b]. That / € AC[a,b ] 
whenever u € AC[a,b] is obvious since | /(/>,)- /(a,)| < V b, f = |u(Z>,)- u(a,)|. 
This proves (ii). □ 


EXERCISES 

> 14. Check that any Lipschitz function on [ a, b ] is absolutely continuous. 

15. Show that the Cantor function is not absolutely continuous on [ 0, 1 ]. [Hint: Re- 
call Example 20. 13.] Conclude that the inclusion AC[a,b] C C[a, b]D BV[a, b] 
is proper. 
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16. Check that AC[ a, b ] is a subspace and a subalgebra of C[ a , 6 ]. Is it a sublat- 
tice? [Hint: If / € AC[ a , Z? ], is |/| € AC[ a, b ]?] Is it closed? Explain. 

> 17. Prove that / € AC[ a y b] if and only if / can be written as the difference of 
two increasing, absolutely continuous functions. 

18. If / : [a, b] — ► R is increasing and absolutely continuous, prove that 
m(f(E)) = 0 whenever E C [a, b] has m(E) = 0. [Hint: If E C U/>il fl i* ^ L 
then /(£)C(J, ),/(*,)].] 

19. If / is continuous on [a, b ), and if m *(/(£)) > 0 for some null set £ C 
[ a, b ], prove that f(A) is nonmeasurable for some (measurable) A C £. 

20. If / € C[ a, b ], show that the following are equivalent (for all £ C [ a, b ]): 
(I) m(E) = 0=>m(f(E)) = 0. 

(ii) E measurable => / (£) measurable. 

[Hint: For (i) implies (ii), note that / maps F„-sets to F„-sets. For (ii) implies (i), 
use Exercise 19.] 

21. You will find a variety of seemingly different definitions for absolute continuity 
in other textbooks. Check that each of the following statements is equivalent to our 
definition of absolute continuity. 

(a) Ve > 0, 35 > Osuch that l/(M — I < £ whenever {(a, , £,)}"_, are 

finitely many disjoint subintervals of [ a, b ] with — I <5. 

(b) Ve > 0, 35 > 0 such that | — /(a,)]| < £ whenever {(a,, 6,)} are 

disjoint subintervals of [ a, b ] with £ (> , I b t — a, \ < 5. 

(c) Ve > 0, 35 > 0 such that £ (>1 V* 1 < e whenever {(a,, b,)} are disjoint 
subintervals of [ a, b ] with £ j>( \b t — a, | < 5. 

(d) Ve > 0, 35 > 0 such that £ i>( a»(/ ; [ a,-, b, ]) < e whenever {(a,, b,)} are 
disjoint subintervals of [ a, b ] with \b, — a, | < 5. [Recall that to(f ; /) is 
the oscillation of / on /.] 


It follows from Proposition 20.15 that each absolutely continuous function / is 
differentiable a.e. and, from Theorem 20.8, that /' is even integrable. Thus it makes 
sense to ask whether fix) = f(a) + f* f holds. To attack this problem, notice that if 
we set g(x) = f* /', then Theorem 20.9 tells us that g is differentiable a.e. and satisfies 
g' = f a.e. All that remains is to show that this last condition forces / = g + C for 
some constant C. 

Theorem 20.16. Iff € AC[a,b] and iff = 0 a.e., then f is constant on[a,b ]. 
Thus, if f, g € AC[a,b) satisfy f = g' a.e., then f - g is constant on[a,b ]. 

proof. Let / e AC[a, b ] with /' = 0 a.e., and fix a < x < b. We will prove 
that fix) = fia). 

Let E = [y e [ a, x ] : fly) = 0}. Please note that E is measurable and that 
m([ a, x ] \ E) = 0. For any point y e E, the fact that fly) = 0 means that there 
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are arbitrarily small closed intervals [ c, d ] containing y such that 


f(d) - /(c) 

d — c 


< e. 


Thus, the collection of closed intervals 


C=\[c,d]:[c.d]cla,x] and 


f(d) - /(c) 


d — c 


< e 


is a Vitali cover for E. 

Now, given e > 0, choose 8 > 0 to work in the definition of absolute conti- 
nuity for /. Then, by Corollary 16.28, there are finitely many disjoint intervals 
{[c, , dj in C such that 

m \ jj[ c„ dt = m a, * ] \ JJ [ c„ d, ] 

n 

= (x - a)- £^(d, - Ci) < 8. 

i=i 


But notice that 

n 

[a,x] \(J[ o.dj ] = [d 0 ,ci)U(d|,c 2 )U(d 2 , c 3 )U -U(d n ,c n+ i ], 
i=i 

where do = a and c n+) = x (if necessary). Hence, 

n - f I n 

^(c, - di- 1 ) = (x - a)- Y^idi - Ci) < 8. 

i=i i=i 


That is, we have partitioned [ a, x ] into two sets of intervals: {[ c,, d, ]}" =1 , taken 
from C, and {(d,_i, c,)}"/, 1 , which have small total measure. Now we use the 
triangle inequality to estimate 

n n-f 1 

l/(*) - /(«) I < Y2 l/W) " /ta> I + E I f(ci) - M -\) I 

1 = 1 i = l 

n 

< e Y^^dj - Ci) + e < e((b - a) + 1). 

;=i 


Since e is arbitary, we have f(x) = f(a). □ 


A function satisfying /' = 0 a.e. is called singular. Theorem 20.16 says that a 
function that is simultaneously absolutely continuous and singular must be constant. 

Corollary 20.17. Let f :[a,b]-*- R. 

(i) / € AC[a, b] if and only if f(x) = C + f* g for some constant C and some 
g e L\[a,b ). 

(ii) / is Lipschitz if and only iff(x) = C + f* g for some constant C and some 

g e Loota.H 

(iii) Each f e BV[a,b] may be written as f = g + h, where g € AC[a,b], 
g(a) = 0, and where h is singular. 
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proof. We have already talked our way through (i): If / € AC[a,b], and if 
we set h(x) = f* f, then h' = /' a.e. Hence, f(x) = C + h(x). Clearly, f(x) = 
f(a) + f* f. The other implication is supplied by Lemma 20.14. The proof of 

(ii) is left as an exercise (Exercise 22). For (iii), notice that if / e BV[a,b], 
then /' 6 L\[a,b\. Hence, if we set g(jr) = f* /', then g e AC[a,b], g(a) = 0, 
g' = /' a.e., and h = f - g satisfies h' = 0 a.e. □ 

When rewritten. Corollary 20.17 (i) will provide a missing detail from Chapter 
Thirteen along with an alternate version of Proposition 20.15 (ii). 

Corollary 20.18. Let f : [a, b ] -*■ R. Then, the following are equivalent: 

(i) feAC[a,b]. 

(ii) f exists a.e., f e L\ [ a, b ], and f(x) = f(a) + f* f. 

(iii) /' exists a.e., f € L x [a,b), and v(x) = V a x / = / a ' |/'|. 

(iv) veACla,b]. 

proof. That (i) implies (ii) is clear. The proof that (ii) implies (iii) follows from 
Theorem 20.9 (iii) and the fact that V* f = V*(f — f(a)) = /* |/'|. That (iii) 
implies (iv) is dead easy: If v is an indefinite integral, then u e AC[a, b]. Finally, 
the fact that (iv) implies (i) is obvious since |/(*) - /(y)| < V* / = |v(jc) - 
v(y)|. □ 


EXERCISES 

t> 22. Prove Corollary 20.17 (ii). 

23. Prove that the decomposition in Corollary 20. 17 (iii) is unique. 

24. If / e AC[a, b ] satisfies /' > 0 a.e., show that / is increasing. 


In Chapter Fourteen we raised the question of when a Riemann-Stieltjes integral 
f dg was equal to the Riemann integral /* fg' (recall Theorem 14. 17). We take this 
one step further and now consider / a 6 fg' as a Lebesgue integral. 


Theorem 20.19. If f e C[a,b ] and g € AC[a,b], then 


( RS ) 



fg'- 


proof. We want to compare / a fc fg' to a typical Riemann-Stieltjes sum for 
la fdg, say 


S g (f, P,T) = J2 f(ti)[g(Xi) - g(x , _,)]. 
1 = 1 
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Since g is absolutely continuous, we may write #(x,) - g(x,_i) = f* , g'< and 
hence 

S,(/. P, T) = £ f A‘i)g'(x)dx. 

1=1 •'•t.-i 


Consequently, 


S g (f, P, T) — f f{x)g\x)dx = J2 [ X ' [m- f(x)]g'(x)dx 

Ja j_| Jx t - 1 

< ^a>(/;[x,_i,x ( ]) f \g'{x)\dx 

i=i 

<a f \g'{x)\dx, 

Ja 


whereof = maxi<, <„ oj{f ; [ x,_i , x, 1). Since / is continuous, a H/ ; [ x,_i , x, ]) -*■ 0 
as x, - x,_i ->• 0. This proves that the norm integral (N) f* f dg equals /* fg'. 
Since g is continuous, (AO /j" f dg — (RS) f* f dg (see Theorem 14.26). □ 


As an immediate corollary we get 


Corollary 20.20. If f, g e AC[a,b ], f/ien 

f fg'+f gf = f(b)g(b) - f(a)g(a). 

Ja Ja 

Corollary 20.17 and Theorem 20.19 shed new light on the nature of Riemann- 
Stieltjes integration against integrators of bounded variation. Recall from Chapter 
Fourteen that each function g e BV[a,b] may be written as the sum of a continu- 
ous function of bounded variation g c and a saltus or “pure jump” function g s . Clearly, 
any saltus function is also singular. (Why?) Corollary 20. 1 7 tells us that we may further 
decompose g c into an absolutely continuous part g ac and a continuous, singular part 
gcs- That is, 


g — gac "t" gcs "F gs‘ 


Theorem 20.19 tells us that integration against g ac reduces to a Lebesgue integral and, 
as we saw in Chapter Fourteen, integration against g s reduces to an infinite series. All of 
the fuss and botheration comes from integration against g cs , the “Cantor function-like” 
part of g. 

Finally, we offer another description of AC[a y b ] that leads to an easy proof that 
AC[ a, b] is a Banach space under the norm ||/||sv = l/(u)l + V„ /.That is, AC[a,b ] 
is a closed subspace (and even a subalgebra) of B V[a, b ]. We will use the characteri- 
zation given in Corollary 20.17 to write AC[a,b ] = L\[a, b] ® R. 

To simplify the notation, we normalize by considering 

A Cota, 6] = {/ € AC[a,b] : /(a) = 0}. 
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Clearly, ACda.b] is a subspace of AC[a,b ] and, for / e ACo[a,b ], the norm 
simplifies to ||/||bv = V b f = /* |/'|. If we define a norm on the space R© ACo[a, b] 
by setting ||(r, /)|| = |/| + V b /, it then follows that 

AC[a, b ] = R © ACq[q, b ], 


isometrically, under the linear map / >— >• (f(a). f — /(«)). 
Next we define the map 


T : L x [a,b\ 


ACo[a,b]by(rg)(x)= [* g. 

Ja 


Thatis, Tg = /, where f(x) = f* g. Obviously, T is linearmdonto (since T(f') = f). 
Also, T is one-to-one because, in fact, T is an isometry. 


I|7sbv = II f\\ B v = v a b f = 




\g\ = kill- 


(By the way, what is T" 1 ?) Thus, 


AC[a, b ] = R ® ACo[a , b] = R ® L\[a, b ], 


isometrically. Since L\[a, b] is complete, it follows that AC 0 [a, b] must also be com- 
plete, and from this it follows easily that AC[a y b] is complete. 

Notice, too, that the map T not only preserves the lattice structure of L\ [ a, b ], but 
it also carries an extra feature that you might not expect: If g > 0 a.e., then Tg > 0, of 
course, but also 


g > 0 a.e. <==> Tg is increasing. 

In fact, the lattice decomposition g = g+ — g~ inLj transforms into the Jordan decom- 
position / = p - a, where / € ACo[a,b ] is written as the difference of its positive 
and negative variations (see Exercise 26). 

Finally, by applying a similar line of reasoning to B V[ a, b ] we could restate Corol- 
lary 20.17 (iii) by writing 

BV[a,b] = AC 0 [a,b]® BV s [a,b] = Lda,b]® BV s [a y b] y 

where BV s [a, b] denotes the subspace of singular functions in BV\a y b ] (which in- 
cludes both the constant functions and the saltus functions). That is, each f e BV[a,b] 
can be written as /( jc) = f * /' + /i(jc), where h is singular. 


EXERCISES 

25 . Prove that the Lipschitz functions on [ a, b ] are dense in AC{ a, b] under the 
variation norm. [Hint: Corollary 20.17 (ii).] 

> 26 . If f(x) = g y where g € L\[a,b] 9 prove that the positive and negative 
variations of / are given by p(x) = f* and n(x) = f* g~. 

27 . Let P L [ a , b ] denote the subspace of all continuous, piecewise linear functions 
in AC[ Qy b ] (i.e., the polygonal functions), and let S[ a, b ] denote the step functions 
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on [a,b]. Use the fact that S[a,b] is dense in L\[a, b] to prove that PL[a , b ] is 
dense in AC[ a, b ]. [Hint: Show that the map (Tg)(x) = f* g carries S[ a, b ] onto 
PL[a,b].] 


0 


Notes and Remarks 


There is a wealth of literature on differentiation, which is testament to the fact that it is 
a complex and delicate subject. For an extensive survey of results, see Bruckner [1994]. 
For more on the history of the results in this chapter, see Hawkins [1970]. 

The material in this chapter is largely based on the presentation in Natanson [1955]. 
In particular, we have followed Natanson’s lead by opting for the efficacy of derived 
numbers in our attack on Lebesgue’s differentiation theorem rather than the more 
commonplace Dini derivatives. The Dini derivatives of / at x, defined by 


D + /( x) = liminf 

h-+ 0+ 


D_/(jc) = liminf 

h-+o~ 


f{x + h)~ fix ) 


fix + h) - fix) 


f(x) = limsup 

/i-> o+ 

D~ f(x) ~ limsup 


f(x + A) - f(x) 


fix + h) - fix) 


were introduced by (and named after) Ulisse Dini [1878]. 

Nearly all of the main results in this chapter, including Theorems 20.6, 20.8, 20.9, 
20. 16 and Corollaries 20.7, 20. 1 1 , 20. 12, and 20. 17, are due to Lebesgue, from roughly 
1903 to 1907, and most appeared in the first edition of the Legons in 1904, although 
not in their current form. Lebesgue’s original version of Theorem 20.6, for example, 
also required that the function / be continuous; this restriction was later shown to be 
unnecessary (see Lebesgue [1928]). The term “absolutely continuous” was introduced 
by Vitali [1905b], who published the first proof of Corollary 20. 17 (i); Lebesgue [1907] 
later gave his own proof (essentially the one given here). 

The discussion of Corollary 20. 17 in terms of Banach space decompositions is based 
in part on my notes from a course on real analysis offered by W. B. Johnson at The 
Ohio State University in 1974-1975. 

For other presentations of Lebesgue’s differentiation theorem see, for example, Riesz 
and Sz.-Nagy [1955], Taylor [1965], or Chae [1980] (for proofs of Theorem 20.6 not 
requiring the Vitali Covering Theorem), and Austin [1965] (for a geometric proof of 
Theorem 20.6). For a proof that v\x) = |/'(;c)| a.e. for / € BV[a, A], see Wheeden 
and Zygmund [1977]. For an elementary proof that m*ifiE)) -+ 0 as m*(£) -* 0 for 
/ € AC[a,b], see Lojasiewicz [1988]. 

For an extensive discussion of absolute continuity and an elementary proof of the 
Banach-Zarecki theorem , which states that a continuous function of bounded variation 
is absolutely continuous if and only if it maps null sets to null sets, see Varberg [1967] 
(or Torchinsky [1988]). Varberg’s proof is based on the following lemmas, which are 
of independent interest: 
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Lemma A. Let f : [a, b ] -*■ R. If f'(x) exists and satisfies |/'(jc)| < K for all x 
in E c [a, b], then m* (/(£)) < Km*(E). 

Lemma B. Let f : [a, b]-+ R be measurable, and let E c[o,b]be measurable. 

If fix) exists (as a finite real number) for all x in E, then m*(f(E)) < f E \f'\. 

The Banach-Zarecki theorem is immediate from Lemma B (and the ideas found in 
Exercises 18-20, for example). Lemma B, in tum, is not hard to deduce from Lemma 
A. The fighting takes place in modifying the proof of Lemma 20.2 to work in the setting 
of Lemma A. Compare the statement of Lemma A with (the much simpler) Exercise 8. 
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disconnected, disconnection, 78, 80 

discontinuous functions, 128-30, 157 

discrete metric, 38, 73 

discrete space, 38, 81 

disjointly supported, 322 

distribution (of mass), 214, 359, 367 

dominated convergence theorem, 328-33, 342, 356 

Egorov’s theorem, 305-6, 338-9 
endpoints of the Cantor set, 26, 29 
enumeration, of countable sets, 18 
equality a.e., 327 
equicontinuity, 178-83 
equicontinuous at a point, 180 
equivalence classes, 105, 328 
equivalent metrics, 48, 55, 70-1, 120-6 
equivalent norms, 48, 124 
equivalent sets, 18 
essential supremum, 347 


essentially bounded functions, 347 
Euclidean norm, 40 
eventually constant sequence, 47 
eventually in (a set), 46 
eventually repeating decimals, 10, 22 
everywhere dense, 59, 132 
extended real numbers, 302 
extended real-valued functions, 302-3 
extensions of continuous functions, 1 19 
extensions of isometries, 104 

F a set, 130 

Fatou’s lemma, 321-2, 342, 346, 356, 365 

Fej6r kernel, 255, 258 

Fej6r’s theorem, 254-7 

finite almost everywhere, 303 

finite intersection property, 112, 126 

finite measure, 305 

finite sets, 1 8-20, 25, 76, 90 

finite subcover, 1 12-3 

finite support, 312 

finite-dimensional vector space, 124-7, 135 
first category set, 132 
fixed points, 97-102, 106, 112, 120, 126 
Fourier coefficients, 140-2, 244, 352 
Fourier series, 140-2, 152, 171, 176, 244-58, 
352-6 

Frechet’s theorem, 309, 3 1 1 
Gs set, 130 

generalized Cantor set, 30-1, 274 
Gram-Schmidt process, 357 
greatest lower bound, 4 

Hausdorff moment problem, 242 

Heine-Borel theorem, 108-9, 126 

Helly’s first theorem, 212 

Helly’s second theorem, 236-7, 242 

Helly’s selection principle, 210-2 

Hilbert cube, 39, 95, 110 

Holder continuous functions, 186 

Holder’s inequality, 44, 50, 343-5, 349, 352, 357 

homeomorphic, 70 

homeomorphism, 69-73, 83-5, 105, 108, 111, 117, 
121-3, 125, 127, 135 

infimum, 4 
infinite sets, 18 

infinite-dimensional normed vector space, 127, 135 
infinitely differentiable functions, 176-8, 334 
inner measure, 270 
inner product, 234, 247, 357-8 
integrable simple function, 313 
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integrable functions, 221-5, 333-5 
integrators of bounded variation, 225-32 
interior, 56 

intermediate value property, 83 
intermediate value theorem, 78, 82, 87 
intervals in R, 80, 83-4 
inverse image, 49 
isolated point, 58, 133 
isometric, 7 1 

isometry, 64, 70-1, 102-4, 1 14, 1 16 
join, 193 

Jordan’s theorem, 207 
jump discontinuity, 15-6, 32 

Korovkin’s theorem, 186 
Kronecker’s delta, 92 

Li norm, 246-7 

L 2 theory, 259, 357 

L p norm, 342 

L p spaces, 342-51 

lattice, 74-5, 193-94 

lattice isomorphism, 162-3 

law of large numbers, 167 

least upper bound, 3 

least upper bound axiom, 3-5, 96 

Lebesgue integrable function, 345, 322 

Lebesgue integral, 322 

Lebesgue integral of a nonnegative function, 315 
Lebesgue integral of a simple function, 312-13 
Lebesgue integration, 259, 312-6, 337-58 
Lebesgue measurable function, 296 
Lebesgue measurable set, 277 
Lebesgue measure, 259, 263-92, 296-31 1, 337-58 
Lebesgue number (of an open cover), 1 14 
Lebesgue numbers, 252 

Lebesgue’s criterion for Riemann integrability, 274 

Lebesgue ’s density theorem, 368 

Lebesgue’s differentiation theorem, 359-69 

Lebesgue’s singular function, 30 

left continuous, 15 

left-hand limit, 15 

length, 268 

length of a curve, 203 

(Beppo) Levi’s theorem, 319 

lim inf (limit inferior), 1 1 

lim inf (for sets), 14 

limit point, 55, 58 

lim sup (limit superior), 1 1 

lim sup (for sets), 14 

limits in metric spaces, 45-9 

limits in R, 14-7 


linear isometry, 124, 162 
linear map, 122, 125 
linear subspace, 40, 106 
lipeomorphism, 121-3 
Lipschitz condition, 66, 76, 1 16, 121-3 
Lipschitz condition of order a, 113, 169 
Lipschitz functions, 66, 69, 76, 105, 1 15-6, 121-3, 
198, 203, 370, 373, 376 
locally integrable, 368 
lower semicontinuous function, 67, 114 
Luzin’s theorem, 310-1 

maximum value theorem, 1 1 1 
meager set, 1 32 
mean values, 50 
measurable functions, 296-310 
measurable sets, 277-89 
measure of a set, 28 1 
measure zero, 27, 269 
measures, 316 
meet, 193 

Mercer’s theorem, 334, 336, 356 

mesh (of a partition), 239 

method of successive approximations, 98, 102 

metric, 37 

metric density, 368 

metric spaces, 36-136 

Minkowski’s inequality, 42, 44, 50, 344, 352, 357 
moment problem, 167-8, 242 
monotone bounded sequence, 6-7, 10-1 
monotone convergence theorem, 317-20, 330, 

334, 350 

monotone functions, 15-6, 31-5, 128, 203, 299 

negative (and positive) variation, 208, 376 
neighborhood, 46 
neighborhood of 00 , 303 
nested interval theorem, 6-7, 91, 133 
nested set theorem, 95, 113, 133 
nodes, 156 

nonmeasurable set, 289-92 
nonoverlapping intervals, 370 
nonterminating decimals, 8 
nonunique decimal expansions, 8, 10, 21 
norm, 39-40 

norm (of a linear map), 123, 235 
norm (of a partition), 239 
norm integral, 239 
normed algebra, 1 88 

normed linear spaces, 39-42, 50, 96-7, 106, 122-7 
normed vector lattice, 193 
normed vector spaces, 39-42, 50, 96-7, 106, 122-7 
not nowhere dense, 132 
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nowhere dense set, 29, 35, 59, 111, 132, 134 
nowhere differentiable continuous function, 1 84 
nowhere differentiable function, 157-8 
nulls sets, 269 

open ball, 45-6, 51-2 
open base, 62, 110, 135 
open cover, 1 12 
open map, 72 

open sets, 51-3, 55, 80, 130-1, 280, 303 

open sets in R, 52, 55, 80, 130 

open sets in R, 303 

operator norm, 123 

orbit, 98 

orthogonal, 244 

orthogonal functions, 247 

orthonormal functions, 247 

oscillation, 128-9 

outer measure, 269-74 

Parseval’s equation, 248-49, 354 

partition, 190, 202, 215 

path connected, pathwise connected, 85, 87 

perfect set, 29, 35, 58 

piecewise linear function, 376 

point of first category, 135 

pointwise almost everywhere (a.e.) 

convergence, 305 
pointwise bounded, 179 
pointwise Cauchy, 149 

pointwise convergence, 143-50, 160, 179, 245, 
250-4, 264, 305 

polygonal function, 156, 163-4, 203, 376 

positive (and negative) variation, 208, 376 

power series, 154 

power set, 23, 25 

precompact, 110, 126 

problem of integration, 265 

problem of measure, 266 

product metric, 48 

pseudometric, 37 

pseudonorm, 40 

punctured neighborhood, 14-15 

quasicontinuous functions, 192, 207, 299 

R" as a lattice, 193 
R" as an algebra, 189 
Radon-Riesz theorem, 357 
ratio test, 14 
rectifiable curve, 203 
refinement, 203 
refinement integral, 240-2 


relative closure, 60-1 

relative continuity, 64-5, 82 

relative definition of connectedness, 79-80 

relative interior, 61 

relative metric, 60-2 

relatively closed, 60 

relatively open, 60, 78 

repeating decimals, 10, 22 

repelling fixed point, 100 

Riemann integral, 217, 263-8, 274-7, 312, 324-6 

Riemann’s condition, 217 

Riemann’s lemma, 248, 253, 259 

Riemann-Lebesgue lemma, 334, 336, 353 

Riemann-Stieltjes integral, 215-42 

Riemann-Stieltjes sum, 225 

Riesz representation theorem, 234-9, 242, 340-1 

Riesz-Fischer theorem, 354-57 

right continuous, 15 

right-hand limit, 1 5 

ring (of functions), 75 

ring (of sets), 295 

root test, 1 4 

Russell’s paradox, 34 

a -algebra, 281 

a -algebra generated by £, 282 
saltus, 208 

second caegory set, 132-3 
self-conjugate, 200 

semicontinuous function, 67, 114, 126 
separable (metric space), 59, 61, 66, 92, 110, 120, 
135, 163 

separates point, 195 

sequential compactness, 108, 126 

sets are not doors!, 54 

signed measure, 332 

simple functions, 301, 312-14 

singular functions, 30, 373-4 

space-filling curves, 85, 88, 155, 160 

standard representation (for simple functions), 

301 

Steinhaus’s lemma, 295 

step functions, 189, 203, 325, 376 

Stone- Weierstrass theorem, complex scalars, 200 

Stone-Weierstrass theorem, real scalars, 196 

strict contraction, 98 

strongly equivalent metrics, 121 

strongly equivalent norms, 1 24 

subalgebra, 188 

sublattice, 192, 196 

successive approximations, 98, 102 

sup norm, 97, 146-48 

supremum, 4 
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ternary (base 3) decimals, 27-8 
Tietze’s extension theorem, 68, 31 1 
topological property, 73, 108, 110-1 
topology, 73 
total variation, 203 

totally bounded sets, 89-92, 108, 110, 112, 
117, 178-9 

totally disconnected, 8 1 
transcendental numbers, 23, 34 
translation, 71 

translation invariant, 272, 277 
triangle inequality, 37, 40, 41-2, 90 
trigonometric polynomial, 1 70-6, 

255, 335 

uncountable sets, 1 8-35 
uniform continuity, 105, 114-20, 

122, 127 

uniform convergence, 143-50, 160, 179 
uniform convergence and continuity, 149-50 
uniform convergence and differentiation, 152 
uniform convergence and integration, 1 5 1 
uniform convergence on compacta, 153 
uniform homeomorphism, 105, 117, 121-3, 
125, 127 

uniform norm, 147 
uniformly bounded, 154, 179 
uniformly Cauchy, 149, 154 


uniformly continuous functions, 105, 1 14-20, 
122, 127 

uniformly equicontinuous, 179 
uniformly equivalent metrics, 117, 121-3 
uniformly equivalent norms, 124 
unique decimal expansions, 10 
upper semicontinuous functions, 67, 114 
Urysohn’s lemma, 31 1 
usual metric, 38, 40, 42, 47-8 

vanishes at no point, 195 
variation (of a function), 202 
vector lattice, 192 
vector spaces, 39-42 
vector space homomorphism, 122 
vector space isomorphism, 1 24 
vibrating string problem, 139-40 
Vitali cover, 287-8, 363 
Vitali’s covering theorem, 287-8 

Weierstrass M- test, 154, 157 
Weierstrass (first) approximation theorem, 162-9, 
198, 256 

Weierstrass’s second (approximation) theorem, 
174-6, 201,256 
well-ordered, 19 

Young’s inequality, 43, 344 
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